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“Time hath not yet so dried this blood of mine nor age my invention ” 
Much Ado About Nothing. 
“ These old fellows 


Their blood is caked, ’tis cold, it seldom flows.” 
Timon of Athens. 


INTRODUCTION 


E NOTION of health or normality is elusive, but its attributes 
can be in part described by a distribution of values for structure 
and function under specified genetic and environmental conditions. In 
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the past, preoccupation with the causes and manifestations of disease 
has distracted attention from the urgent need for information from 
healthy people of all ages, and only in recent years has this been forth- 
coming. 

Many techniques of measurement are not sufficiently standardised, 
and their inherent errors are undetermined. In the case of clinical assess- 
ment, the bias associated with all subjective methods introduces large 
differences between observers, and individual observers are not always 
consistent. 

Where groups of individuals are being compared, it is not sufficiently 
realised that any particular type of measurement may be affected by a 
number of factors such as time (diurnal, day to day, menstrual, seasonal, 
growth and ageing), sex, race, nutrition, state of activity, environment 
and so forth. Such variation is not always controlled or even specified. 
Furthermore because of the inherent variability of all biological maverials, 
tests of statistical significance are required when distributions or mean 
values are being compared. 

The increasing number of old people in our midst has focussed atten- 
tion on the anatomical, physiological, psychological and pathological 
changes occurring in old age. Although it is generally accepted that many 
characteristics change from infancy to maturity, it is sometimes assumed 
that from then onwards little or no further change is found. A healthy 
old man or woman is thus expected to conform to the ill-defined standards 
at present available for younger groups. The only valid control for a 
group of old people is another group of the same age. We prefer to 
accept a standard of health or normality based on the behaviour of an 
individual and his ability to fit into the society in which he lives, rather 
than on preconceived ideas of clinical signs and symptoms. 

During the last few decades the literature on blood pressure and the 
level of blood constituents of people in the higher age groups has accu- 
mulated, but choice of subjects and the analysis of the data has not 
always permitted valid conclusions. The present study relates to data 
collected over a number of years on 140 presumably healthy people of 
60 to 104 years of age. Data from younger groups is presented for 
comparison. 
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METHOD 
Clinical and Technical 


The subjects included in this study were met in general medical 
practice and in homes for the aged in England. Those aged 50 to 59 
and many in the age group of 60 to 69 were volunteers. All were what 
is commonly regarded as “ normal ” or healthy for their age, being free 
from clinical signs and symptoms suggesting disease and up and about 
during most of the day. Those under 70 were engaged on some form 
of work or duty and those over 70 if not working, had some minor hobby 
or interest. Their diet was considered sufficient and balanced, but was 
not controlled in any way. 

All were kept under observation for a period of several weeks 
after which they were classified for “ fitness” for their particular age. 
The assessment was purely subjective and made by the same observer 
throughout the study. The factors taken into consideration were 
general appearance, behaviour and activity. The rating was: Class (A) 
= younger than age; Class (B) — corresponds to age; Class (C) = older 
than age. On a subsequent occasion, a general clinical examination was 
carried out. After a preliminary chat to minimise nervousness the blood 
pressure was taken by a standard method with the subject seated. Oral 
temperatures were taken morning and evening for a week before blood 
sampling. In general the range of temperature corresponded more or 
less to that of younger adults. During the summer months a morning 
temperature up to 99°F was accepted as normal in the absence of clinical 
signs and symptoms. About 10% had a palpable liver but no spleno- 
megaly was found. Apart from this the findings were in general those 
expected in healthy old people. 

Blood samples were taken in the mornings (9 a.m. to noon) about 
2-4 hours after a light breakfast. After sitting quietly for about half 
an hour, 5 c.c. of venous blood were collected with minimum venous stasis 
and with the subject still seated. Heller oxalate mixture was used as 
anticoagulant. A Wintrobe haematocrit tube was filled with well mixed 
blood and readings of the erythrocyte sedimentation rate (E. 8. R.) taken 
after 1 and 3 hours. Only the former are recorded in this paper. The 
tube was spun at 3500 R. P. M. until constant packing volume occurred. 
Fragility of the red blood cells (R. B.C.) was estimated on unwashed 
cells in carefully prepared saline solutions (0.30% to 0.58%) and the 
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results recorded after 4 hours at laboratory temperature. In a number 
of cases white blood cells (W. B.C.) in the venous blood were counted. 
Blood urea was estimated by a standard urease method, and in a number 
of cases the plasma cholesterol and whole blood chloride were estimated. 
Urine examination consisted of routine tests for glucose, protein, casts 
and specific gravity. In some instances the 24 hour urine volume was 
measured (non-catheter specimens) and a urea concentration test carried 
out. The same techniques were used throughout the study. 

Repeat examinations were carried out whenever possible, and in some 
individuals three examinations were made over a period of weeks or 
months. Diurnal variation has been minimised in this study by blood 
sampling in the mornings. It was not possible however to take seasonal 
or longer term variation into consideration. All individuals were kept 
under observation for a period of at least a year and some for 2 years or 
more. During the period of study two individuals over 90 years of age 
died but there was no opportunity for autopsy. 


Statistical 


The primary object of the analysis was to throw light on the physi- 
ological variation in old age, and for this purpose 5 particular com- 
parisons were regarded as being of special interest: (1) sex difference, 
(2) changes in mean value with ageing, (3) change in variability as 
indicated by the standard deviation (S.D.) with age, (4) difference 
between means and standard deviation found in the present study and 
those from younger groups, (5) correlations and regressions of certain 
pairs of measurements. Most of the techniques used are perfectly 
standard and require no elaboration. However a few explanatory notes 
are necessary. 

For the analysis, the observations were divided into age groups as 
follows: (1), 90 years and over; (2), 80-89 years; (3), 70-79 years; 
(4), 60-69 years. It must not be presumed that these age groups are 
strictly comparable and this is to be borne in mind in assessing the 
meaning of age differences. Where suitable, the difference between the 
means of two groups has been tested for statistical significance by means 
of Student’s t-test; where the variances of the groups differed signifi- 
cantly, the Behrens or d-test (Fisher and Yates 1948) was used. Where 
more than two groups were involved the analysis of variance was 
employed. In some instances this was of doubtful validity owing to the 
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ratio of the variances (F’) of the different age groups being significant, 
but the general picture shown by the analysis is probably roughly true. 
Similarly, although some of the distributions are clearly skewed, this is 
unlikely to affect the conclusions materially (Pearson 1931). 

In considering age variation, analysis of variance was preferred to 
calculation of a correlation or a regression coefficient, since within these 
wide age limits there seemed to be no reason to suppose a linear increase 
or decrease with age. The figures analysed showed that in some cases 
at any rate, the change with age was not in fact linear. Owing to the 
unequal numbers involved in the various groups, no attempt was made 
to carry out analysis of variance of the data in such a way as to divide 
the variance into the components—“ between ages,” “between sexes,” 
“interaction ” and “ residual”; i.e. age variation was assumed inde- 
pendent of sex difference. The difficulties and uncertainties of a complete 
analysis with unequal numbers [Kendall 1946(a)] did not appear jus- 
tified in this case. 

In comparing young with old, the groups have as far as possible been 
chosen to be analogous. If, for example, all those in the young group 
were men, only observations in older men were used in comparison. In 
some instances the observations en the young groups were made at a 
number of different times of the day and the “ between times ” variation 
proved significant in the analysis of variance (Renbourn 1947), whereas 
the observations on the older individuals can only be described as having 
been made between 9 a.m. and noon. In these cases for the purpose of 
comparison, the mean of the young group has been taken to be the 
average of the means at 9 a.m. and noon, and the variance used is the 
sum of the variance “between individuals” and the variance corre- 
sponding to the difference between the means at 9 a. m and noon. 

In addition to the comparisons of principal interest described above, 
an attempt was made to relate the rating of “ fitness ” already described, 
to the E. 8S. R. and blood urea. Owing to the small number assessed as 
C, groups B and C were combined (group BC) and compared with 
group A. Since the proportion of people classified A rises with advancing 
age, a direct t-test on the difference between the mean of group A and 
group BC is only conclusive if there is no marked age variation. The 
ideal method would be to obtain for A and BC separate regression lines 
for such measurement on age, and to determine whether the slopes of 
these lines, or the means of the observations adjusted for slope, were 
significantly different [Kendall 1946(b)]. This however implies lin- 
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earity of change with age and homogeneity of variance. In the case of 
the E. S. R. where there is marked (although not significant) age varia- 
tion, probably the former and certainly the latter condition is not satisfied. 
Consequently the t-test or the Behrens test has been carried out separately 
for each age group. With the blood urea, there are no A’s in group 4 
and the remaining 3 groups appear to be homogeneous, so that a com- 
parison has been made between the means of these groups combined. 

Where regressions have been calculated between two variates other 
than age, the coefficients may be due to the fact that the variates them- 
selves change with age. In order to estimate the effect of age itself, 
a partial regression should be calculated. Comparison of scatter diagrams 
[between (a) haemtocrit and E.S.R. and (b) diastolic blood pressure 
and blood urea], for the various age groups, suggested that elimination 
of the age factor would not materially alter the conclusions drawn 
concerning the regression and the correlation coefficients. 

Data insufficient for statistical analysis have not been included in 
this study. The term significant is used here in the statistical sense. 
Complete data have not been given, but it has been considered of suffi- 
cient interest to give that of the oldest group (Table 1). 


RESULTS 
Data of the oldest age group 


This contains 12 individuals of age 90-104. Details are shown in 
Table 1. 

The table shows a number of points of interest. The man of 104 is 
of group A and shows on one or other occasion figures for haematocrit, 
E. 8. R., blood urea and blood pressure, which are accepted as normal 
for young adults. The consistency of the measurements (except for 
blood urea) in any individual may be noted. In the case of blood urea 
it appears that a figure as high as 125 mgs may occur in an otherwise 
healthy individual and fall to 62 mgs in a second test. In no case is 
the systolic blood pressure over 160 mm or the diastolic over 90 mm. 


Distribution of “ fitness” rating in age groups 


Examination of the distribution of group A and group BC (sexes 
combined), in the various age groups shows that in the oldest group the 
proportion of A’s is much higher. This cannot be tested for significance 
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by the x? test, since the expectation of A’s in this group is very small 
(3. 4) and gives a large contribution to the total x’. If the oldest group 
is excluded we have x? = 0.69, 2 d.f., not significant. It seems extremely 
probable that in the oldest group the distribution of the A’s and BC’s 
will be different, since a subjective assessment of so unusual a population 
is bound to be unreliable and subject to bias. 


TABLE 1 
Data from individuals over 90 years 
SUB- “FITNESS” HAEMA- BLOOD BLOOD 

SEX JECT AGE RATING TOCRIT E. 8. R. UREA PRESSURE 

m 1 104 A 43 (8) 20 (1) 51 (37) 120/85 epi 
m 2 95 A 45 (45) 31 (32) 62 160/80 (140/70 
m 3 94 A 46 22 8) 55 160/70 (160/80 
m 4 93 A 47 (49) 18 (18) 125 (62) 110/80 (105/70 
m 5 93 A 52 3 (5) 36 135/70 (140/70 
m 6 92 A 42 11 (19) 56 125/80 (120/80 
m 7 91 A 40 11 (9) 42 (62) 160/70 (150/70 
m 8 90 B 37 31 68 125/80 

f 1 94 A 46 (43) 10 © 29 140/80 (120/70) 
f 2 92 B — 49 (46) 28 140/65 ti fo/any 
f 3 92 Cc os 30 59 (47) 150/90 (140/85) 
f 4 91 B — 23 26 120/70 


m = male; f = female. Figures in parentheses are repeat examinations. 


The data are too few to warrant comparison of “ fitness” rating 
between sexes, but it would appear on the face of it that for age groups 
combined the incidence of A’s is higher in men. It is however not likely 
that the “ fitness ” rating is a good index of longevity. 


Haemtocrit 


The results are summarized in Table 2. 

In view of the very restricted age range for which figures for women 
are available, age variations for men and women have been treated 
separately. In the case of the women only two groups are available for 
comparison. The difference between the group means is clearly not 
significant. In the case of men an analysis of variance has been carried 
out as shown in Table 3. 
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The analysis shows no significant age variation among the men. For 
comparison with young men we have: 


Young group: 40 individuals (20-40 years) : 39 d.f., mean = 45.37, 


S.D. = 2.83 
Old group: 43 individuals (60-104 years) : 39 d.f., mean = 43.86, 
8.D. = 5.11. 
TABLE 2 
Haematocrit, summary of results 
NO. IN MEAN + SEX 
AGE GROUP 8. BE. m. 8. D. DIFFERENCE COMBINED SEXES 


(years) m f m f m f t Sig. level No. in Mean + 8. D. 
Group 8. E. 


m 1 8 44.041.6 4.6 

f insufficient data 
m 2 17 43.541.4 5.9 

80-89 <1 Not sig. 2700 43.141.1 5.7 
f 10 42.541.7 5.5 
m 3 13 43.5241.1 4.1 

70-79 ~1.35 Not sig. 31 42.1+0.89 5.0 
f 18 41.141.4 5.8 
m 4 5 45.8242.4 5.3 

f insufficient data 
m 43 43.940.76 4.9 

f 
m 30 43.520.93 5.1 

po 3 ~1.4 Not sig. 58 42.6+0.70 5.3 
f ad 28 41.641.02 5.4 

TABLE 3 
Haematocrit, analysis of variance, males, age groups 1—4 

SOURCE OF SUM OF MEAN SIGNIFICANCE 

VARIATION SQUARES D. F. SQUARE P LEVEL 
Between age groups 23.0 3 7.7 <1 Not sig. 
Within age groups 1016.2 39 26.1 
Total 1039.2 42 


D. F. = degrees of freedom. 


(The degrees of freedom for the old group is reduced by 4 to 39 since 
the “between age groups” variance has been eliminated.) From the 
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ratio of the standard deviations we obtain F — 3.26, ny =n, = 39, 
P < 0.1%, significant. The young group is thus less variable. 

The difference between the means is tested by the Behrens method. 
This gives 


=181—tané; @—61°, 
where 
S, = E.m, = 2.83/39. 
V8,? + 0.922 
45.37 — 43.86 pas 
d 1.62, not significant. 


Thus while there is a suggestion that the younger groups may have a 
higher haematocrit value than the old, this is not statistically proven. 
Examination of the data shows no relationship between “ fitness ” rating 
and haematocrit value. 


Erythrocyte sedimentation rate 


The results are summarised in Table 4. 


TABLE 4 
Erythrocyte sedimentation rate: Summary of results 


NO. IN MEAN + SEX 


AGE GROUP 8. BE. m. 8. D. DIFFERENCE COMBINED SEXES 

mf =m f m f t Sig.level Group 8.E.m. 8D. 

m 18.443.5 9.9 

1 ~1.3 Not sig. 12 21.643.61 12.5 
90+ 4 28.048.1 16.3 
m 28 24.242.7 14.5 

2 <1 Not sig. 51 14.7 
{ 80-89 23 25.143.2 15.3 
m 28 22.142.4 12.7 

3 <1 Not sig. 51 22.841.82 13.0 
70-79 23 23.7+42.9 13.6 
m 18 15.842.1 8.8 

4 <1 Not sig. 22 i5.441.74 8.2 
{ 60-69 4 13.542.4 4.8 
m 82 21.0741.39 12.6 

lto4 ~1.2 Notsig. 136 22.1841.14 13.3 
54 23.8741.93 14.2 


There is no significant sex difference in any of the age groups or in 
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the age groups when combined. A similar result is obtained from a few 
people aged 50-59 years, for whom the results are: 


Male: mean = 8.15 + 1.1 (n= 13),8. D. = 4.0 


Female: mean = 6.71 + 1.2 (n= 7%),8. D. = 5.2, 
t < 1, not significant. 


A significant sex difference hence appears absent over the age of 50. 
The variance analysis with respect to age is given in Table 5. 


TABLE 5 
E.S.R.: Analysis of variance, combined sexes, groups 1—4 


SOURCE OF SUM OF MEAN SIGNIFICANCE 
VARIATION SQUARES D. F. SQUARE F LEVEL 
Between age groups 1 338.3 3 446.1 2.63 P~ 0.05 


Within age groups 410.1 132: 169.8 
Total 23 748.4 135 


The differences between age groups are close to the 5 per cent level 
of significance. There is a strong suggestion that the mean value of 
group 4 is lower than the means of other groups, the other groups being 
strikingly homogeneous in their means. There is also a suggestion that 
the standard deviation of group 4 is lower than that of the other groups. 
When tested by the Behrens test, the difference between the mean of 
group 4 and that of the other three groups was significant at the 1 per 
cent level. 

Two younger groups are available for comparison. The former is 
the group of 20 men and women (combined) of age 50-59 given above, 
for whom the mean = 7.65 + 0.79, S.D. 3.50. The second group 
consists of 40 young men of age 20-40 and the results obtained are: 
mean = 2.97 + 0.68, S. D. = 4.27. It is clear that for both of these 
younger groups the means and standard deviations are lower than even 
group 4 (60-69 years). 

We may therefore conclude that both the mean and standard deviation 
rise with age until about 70 years, after which they stay fairly steady. 
At 60 they are both still rising. 

The comparison of “ fitness” rating and E.S.R. is summarized in 


Table 6. 
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TABLE 6 
Comparison of “fitness’’ rating and E. S. R., combined sexes 


AGE NO. IN E. 8. RB. YARIANCE BC 
CLASS med GROUP MEAN-+58. E. m. 8. D. VARIANCE A DIFFERENCE BC & A 
™ ABC A BC A BC F Sig.level tord Sig. level 
A 15.843.1 8.8 
1 1.6 Not sig. t=3.2 P<0.1% sig. 
BC 90+ 4 33.345.5 11.1 
2 5.8 P<1% sig. d=3.4,0=41°, P<1% sig. 
BC 80-89 42 26.742.6 16.6 ‘ 
3 3.2 P<5% sig. d=5.6, 6 =45°, P<1% sig. 
BC 70-79 38 «27.9420 126 
15.9 P<1% sig. d=5.0,0=26°, P<1% si 
e =0.U,0= slg. 
BC 60-69 17 7.8 


Probability levels of the Behrens test are not tabulated for P < 1%, 
but it seems extremely likely that for groups 3 and 4, the differences 
are significant at 0.1%. It is clear that the “ fitness ” rating divides the 
population into groups whose E. S. R. have quite different characteristics, 
the A group being of low mean and variance, with the BC group being 
of high mean and (with the possible exception of the oldest group) of 
high variance. Within each of these groups there appears to be an 
increase in E. S.R. with increasing age; this is in contrast to the 
picture obtained on the combined data of all three groups A, B, C, 
shown above (Table 4), where over the age group of 60-69 years there 
is no marked trend with age. Analysis of variance shows however that 
in neither case is the trend significant. 


Red cell fragility 


One hundred and one estimations (combined sexes, age 60-89) have 
been compared with those of 38 healthy adults (combined sexes) of ages 
18-40 years. The results are comparable since the same method and 
technician were used for both age ranges. In both groups the concen- 
trations of saline solution for complete haemolysis were 0.30 to 0.34 per 
cent and since the frequency distributions appeared very similar no 
analysis was done. In the case of commencing haemolysis, the distri- 
butions were sufficiently different to warrant analysis (see Table 7). 
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TABLE 7 
Frequency distribution of commencing haemolysis 


SALINE % NO. IN 
Boog 0.42 0.44 0.46 0.48 0.50 0.52 0.54 Gem 
GROUP 


Young 0 11 17 8 2 0 0 38 
Old 0 2 8 38 48 3 2 101 


Comparison by the ¢-test gives: 
Young group, mean = 0.4505 + 0.0028, S. D. = 0.017 
Old group, mean = 0.4795 + 0.0017, 8S. D. = 0.017 
t = 8.96, P < 0.1%, significant. 


There appears to be a highly significant difference between old and young 
groups. 
White blood cell count 


The result for 25 individuals (combined sexes) of age 70-79 years 
was as follows: 
mean = 7200 + 362; S. D. = 1810. 


Blood urea 


The results are summarized in Table 8. 


TABLE 8 
Blood urea: Summary of results 


NO. IN 
AGE GROUP MEAN+8. E.m. 8. D. SEX DIFFERENCE COMBINED SEXES 
SEX GROUP 
(years) No. in 
mf m f m f t Sig. level Group Mean+S.E.mu. 
m 8 59.2+10.0 28.3 
1 1.53 Not sig. 12 51.4+47.7 26.8 
f 90+ 4 35.5+7.9 15.7 
m 17 48.943.4 14.0 
2 <1 Not sig. 31 48.142.6 14.3 
f 80-89 14 47.1+4.1 15.3 
m 14 52.6+43.2 12.2 
3 1.39 Not sig. 32 55.2+41.7 9.8 
f 70-79 18 57.2+2.0 7.8 
m 7 39.3243.9 10.4 
4 <1 Not sig. 9 38.143.5 10.6 
f 60-69 2 34.0+10.0 14.1 
m 46 50.442.5 16.9 
lto4 <1 Not sig. 84 650.2+41.7 15.6 
f 38 50.0+2.3 14.0 
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It thus appears that there is no significant difference between the 
sexes in the age range examined. Consequently for variance analysis 
the sexes were combined. The results are given in Table 9. , 


TABLE 9 
Blood urea: Analysis of variance, combined sexes 


SOURCE OF SUM OF MEAN SIGNIFICANCE 
VARIATION SQUARES D. F. SQUARE F LEVEL 


Between age groups 2 255.4 3 751.8 3.35 P~5% 
Within age groups 17 964.7 80 224.6 
Total 20 . 83 


The differences between the means of the age groups are thus barely 
significant. The effect appears to be largely due to the low blood urea 
in the youngest age group. It is therefore of interest to compare these 
data with those obtained from estimation of blood urea from 18 healthy 
men and women (sexes combined) in the age range of 20-25 years. The 
results from the younger group are: 


mean = 28.05 + 1.6, 8. D. — 6.6. 


There were no A’s in group 4, so that this group is excluded from the 
(groups 1-4 sexes combined) is overwhelmingly significant. Similarly 
the variance of the young group is very much lower than that of the 
least variable of the older groups. 

Both the mean and the standard deviation of the blood urea therefore 
appear to rise sharply with age. The mean reached a steady level above 
the age of 70, but from the very limited data available in the age range 
60-69 it does not appear to have reached its final steady value at that age. 
The standard deviation appears to increase steadily up to the oldest age 
group. 

The data relating blood urea to “ fitness ” assessment is summarized 

in Table 10. 
There were no A’s in group 4, so that this group is excluded from the 
analysis. There is no general trend in difference between the classes in 
the various age groups, and the mean difference between A’s and BC’s 
for the three groups combined is clearly not significant. The variance 
of the A’s does however appear to be significantly greater that that of 
the BC’s. 
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TABLE 10 
Mean blood urea and “fitness” rating, combined sexes 


GROUP A GROUP BC 
AGE GROUP 

No. in group mean No. in group mean 

1 8 54.5 4 45.3 

2 5 43.0 26 49.1 

3 6 50.3 26 56.3 
Groups | to 3 19 50.2+4.8 56 52.1+1.8 
8.D.=21.0 8.D.=13.4 


Blood chloride 


The results from 16 individuals (combined sexes) of age 70-79 years 
give the following figures: 


mean = 481.0 mgs% + 5.9, S. D. = 23.8 
— 82 mM + 1.0, S. D. = 4.0. 


Blood cholesterol 


The results from 16 individuals (combined sexes) of age 70-79 years 
gave the following figures: 


mean = 186.0 mgs% + 5.9, S. D. = 23.8. 


Blood pressure, systolic 
The results are summarised in Table 11. 


TABLE 11 
Blood pressure, systolic: Summary of results 


NO. IN 
GROUP MEAN+S. E. m. 8. D. SEX DIFFERENCE COMBINED SEXES 
SEX AGE 
GROUP No. in Mean + 
m= 4 m f m f t Sig. level Group 8. E. m. 8. D. 
m 8 139.447.5 21.3 
1 <1 Not sig. 12 138.8+5.2 18.2 
f 90+ 4 137.5+6.3 12.6 
m 21 152.4+4.8 21.8 a8 22.9 
< t sig. .623. a 
f 80-89 17 145.8+45.9 24.4 — 
m 28 159.7+44.4 23.0 wines 
ot sig. .O+4. 
f 70-79 22 169.6+47.3 34.4 ss 
m 
‘ 4 insufficient data 
m 57 154.1+43.0 23.0 185.443.7 
< ot sig. 1 
f 43 157.2+45.0 32.5 
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Neither in the groups examined (1, 2,3) nor in these groups combined 
is there a significant sex difference. Age variation for sexes combined 
was subjected to variance analysis. The differences were found to be 
significant at the 1% level. It thus appears that the means of the systolic 
blood pressure decrease with increasing age in the age range 70 to 104 
years. There is a suggestion in the data that the variances also decline. 


Blood pressure, diastolic 


The results are summarised in Table 12. 


TABLE 12 
Blood pressure, diastolic: Summary of results 


NO. IN 
AGE GROUP MEAN+8.E. m. 8. D. SEX DIFFERENCE COMBINED SEXES 
SEX GROUP 
(years) No.in  Mean+ 
m f m f t Sig. level Group 8. E. m. 8. D. 
m 8 78.1+41.9 5.3 
1 = Not sig. 12 77.5242.1 7.2 
f 90+ 4 76.3245.4 10.8 
m 21 87.142.4 11.2 
2 <1 Not sig. 38 85.6+2.0 12.3 
f 80-89 17 83.543.3 13.6 
m 28 86.1+44.7 25.0 
3 ~1.1 Not sig. 50 89.3+3.3 23.6 
f 70-79 22 93.4+4.6 21.7 
m 
4 insufficient data 
f 60-69 
m 57 85.442.5 18.9 
1 to 3 <1 Notsig. 100 86.5+1.9 18.8 
f 43 87.9+2.9 18.8 


The marked decrease in standard deviation with increasing age may 
be noted. For the combined sexes, the slight decline in the means from 
group 3 to group 2 was not significant. The decline from group 2 to 
group 1 was significant at the 1 per cent level, as tested by Behrens test. 

It seems that the diastolic blood pressure, after remaining sensibly 
steady between 70 and 90 years of age, falls appreciably and that the 
variance between individuals falls markedly with increasing age in this 
range. The data are not sufficient to warrant a complete analysis 
according to “ fitness” rating but superficial examination suggests no 
appreciable difference between group A and group BC in either systolic 
or diastolic pressure. 
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EXAMINATION OF REPEAT ESTIMATIONS 
Haematocrit 


Examination of repeated tests showed no significant difference between 
the first and second test, with a relatively small standard deviation. 
These results correspond to those found by us in young adults and 
suggest the existence of a haematocrit value characteristic of the indi- 
vidual (Renbourn 1947). 


Erythrocyte sedimentation rate 


The results are summarised in Table 13. 


TABLE 13 
E.S. R.: Analysis of repeated tests 


NO. IN SIGNIFICANCE 
AGE MEAN GROUP Ss. D. t LEVEL 
90 and over —1.00 10 6.60 <1 Not sig. 
80—89 1.05 17 4.85 <1 Not sig. 
70—79 —0.30 15 5.89 <l Not sig. 


There is hence no significant difference between first and second tests. 
Examination did not reveal any relation between the differences and the 
numerical value of the E.S. R. i.e. errors are not obviously greater for 
fast than for slow rates. 


Blood urea 


The data from two age groups (70-79 years, and 80 years and over) 
have been examined separately. In neither case was there a significant 
difference between first and second tests. The standard deviations of the 
differences were 9.8 and 22.3 respectively. 

In the three variates examined above there is no systematic difference 
between first and second tests but the amount of variation may be 
appreciable. 

Incidence of “ normality ” 


In order to obtain an assessment of “ normality ” in terms of one or 
more of 5 tests used in this study, data were examined from 61 people 
of both sexes over 70 years of age for whom results for all the tests were 


available. 
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The tests used and the criteria of normal levels accepted for adults 
were as follows: (a) haematocrit, (38% and over), (b) E.S.R. 
(15 m.m./hr. and below), (c) blood urea (45 mgs. % and below), (4d) 
systolic pressure (160m.m. Hg. and below), (e) diastolic pressure 
(90 m.m. and below). For the individual tests the incidence of “ nor- 
mality ” was as follows: haematocrit 49/61, E. 8S. R. 21/61, blood urea 
21/61, systolic pressure 41/61, and diastolic pressure 35/61. 

When the tests are considered together the result are as shown in 
Table 14. 

TABLE 14 


Distribution of “normality” in 5 tests. 61 people 70—104 years. 


NUMBER OF TESTS FOR 


WHICH “‘NORMALITY™ IS FREQUENCY 
PRESENT 
5 5 
4 10 
3 24 
2 15 
1 5 
0 2 


It is seen that if normal adult levels are required for the above 5 tests, 
then only 5/61 (8.2%) can be considered as normal. The mode of the 
distribution corresponds to 3 tests. 


Correlation coefficients and regression equations 


The results are given in Table 15 below. 


TABLE 15 
Correlation coefficients and regression equations, combined sexes 


CORRELA- NO. 


VARIATE TION PAIRSOF SIG. REGRESSION 
COEFFI- OBSER- LEVEL EQUATION 
z y CIENT VATIONS 
E. 8. R. Blood urea +0.078 83 
E. 8. R. B. P. systolic —0.044 102 
B. P. diastolic Blood urea +0. 233 70 =P-5% y =32.5+0.22z 
Haematocrit E.S. R. —0.270 70 P<5% y =57.4—0.762 


From what has been said above (see Method) it is likely that if the 
effect of age were removed, the partial correlation coefficients and the 
regression equations would not be markedly different. 
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DISCUSSION AND REVIEW OF LITERATURE 


It has not been sufficiently realised until recent years that appreciable 
errors are associated with the measurement of Hb. concentration (Mac- 
Farlane et al 1948) and R. B. C. count (Berkson et al 1940, Biggs and 
MeMillan 1947-8). The random error of haematocrit measurement is 
small, and systematic errors are well known (Millar 1925, Maizels 1945). 
There has been controversy as to variation in R. B. C. and haematocrit 
in different parts of the vascular system, but consensus of opinion is that 
it is small. Diurnal and short term variation in haematocrit has been 
described by one of us (Renbourn 1947) and seasonal variation has been 
reported by Pucher et al (1934). Although the sum of such errors and 
variations may be appreciable, they may perhaps be neglected in studies 
of groups of individuals examined under standard conditions. 

The earlier literature concerning the R. B.C. in old people was 
reviewed by Schwinge (1898), and many workers considered that anaemia 
was not an uncommon finding. Millet and Balle-Helaers (1932) among 
other observers described a rise in R. B. C. count and Hb. in old age but 
it is not clear whether the subjects in these various studies can always 
be regarded as clinically normal. 

No anaemia has been found in healthly old people by Wintrobe 
(1933), Miller (1934), Fowler et al (1941), Newman and Gitlow (1943), 
Olbrich (1947) and Howell (1948) ; the data reported here confirms this 
general finding. Although the mean haematocrit values in old people are 
lower than the standards described for young adults (Wintrobe 1933, 
Price-Jones et al 1935) the old and young groups are not always strictly 
analogous. In our data the old group has a lower mean value but the 
difference found was not significant. The young , oup was composed of 
soldiers investigated during one summer month of the same year, whereas 
the old individuals were examined over a period of several years. 

In comparable groups of young adults women usually show lower mean 
haematocrit values, and in old people similar results are described by 
Fowler et al (1941), Newman and Gitlow (1943), Olbrich (1947) and 
ourselves. Although in our data there does not appear to be a significant 
sex variation, this is present in the data of Newman and Gitlow and 
that of Olbrich. Much more data are required on sex difference in the 
various properties of the R. B.C. 

The technical errors associated with measurement of the E. 8. R. are 
well known and short and long term variations have been described 
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(Renbourn 1947). The range of values in healthy adult males has been 
given by Wintrobe and Landsberg (1935) as 0-9 mm/hr. However using 
the same techniques our results show that figures as high as 13-15 mm. 
may be found in the complete absence of disease, and repeated tests over 
a long period of time show that these are consistent in a particular 
individual. The data of Gallagher (1934) give a normal range of 
0-20 mm. in young men. 

Gram (1929) corrected the E.S.R. for the presence of anaemia by 
the Hb. content of the blood, and Wintrobe and Landsberg (1935) based 
their correction curves on the haematocrit value. However since the 
validity of any such procedure is open to question (Terry 1950), we have 
not attempted to correct our data for a standard haematocrit value. 
Wintrobe and Landsberg (1935) found in young adults a high correlation 
between the E.S.R. and haematocrit (r—— 0.86 + 0.013), but the 
heterogenity of a group of old people may explain the considerably lower 
figure found by us (r = — 0.27, P < 5%). 

The early German workers (Westergren 1924, Katz and Leffkowitz 
1928, Léw Beer 1929, Burkhardt 1932, Lasch 1931), described an increase 
of E. S. R. in individuals over the age of 60 years. Miller (1935-6) and 
Eckerstrom (1950) found a wider range of values in old people than in 
young adults, but nevertheless claimed that in the absence of disease 
there was no age variation. On the other hand Brodin et al (1937), 
Holzer (1938), Akizuki and Hosi (1938), Hampe (1940) and Olbrich 
(1948) found an increase in E. S. R. in healthy old people. Examination 
of our data suggests a progressive rise in mean and standard deviation 
from young adult life to 80-89 years, followed by what appears to be 
a fall, but analysis shows that up to the age of 70 the increase with age 
is significant, after which the variations can be attributed to random 
fluctuation about a constant mean. 

All workers describe a faster E.S. R. in women. This is first apparent 
in puberty and becomes more definite in adult life. The data of Wintrobe 
and Landsberg (1935) and Terry (1950) show that this difference is 
significant. Among elderly individuals Olbrich (1948) noted a very 
small sex difference, and in the present data no evidence for a significant 
difference was found. 

Mollison (1946) and Adelsberger (1946) suggested that malnutrition 
can produce an increase in E.S.R. Kountz et al (1947) found a nega- 
tive nitrogen balance in 11/27 healthy old people, but this is not to be 
regarded as deficient nutrition in the ordinary sense; digestion, absorp- 
tion and endocrinal factors must be considered. 
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A high blood cholesterol is often associated with a raised E. S. R. but 
as shown below the former is not characteristic of the ageing process. 
Although the causal relation of the plasma proteins to the suspension 
stability of the R. B.C. is not clear (Ropes et al 1939, Kopp 1941-2) 
there does appear to be a connection between plasma globulin or fibrinogen 
concentration and the E.S. R. Some investigators report an increase in 
the total plasma protein concentration in old age (Boselli et al 1950) 
whereas others report a fall (Bock 1947, Olbrich 1948) but there appears 
to be a general agreement on an increase in the globulin fractions. 

The literature provides evidence of a hypothalamic and endocrinal 
control of plasma protein fractions (Podhradszky 1940, White and 
Dougherty 1945, Levin 1944), and it is possible that sex and age varia- 
tion in concentration of globulins and in E. 8S. R. are partly conditioned 
by these systems. 

We have shown that when the E.S.R. is classified according to 
“ fitness ” rating, the means for the fittest individuals (group A) are 
significantly lower than those of the remainder (group BC) in each group 
examined, but still significantly higher than the means of a young group. 
Since the rating was entirely subjective the association is of considerable 
interest. If there is a relationship between the wear and tear processes 
of ageing and the colloidal properties of the blood plasma we may have 
an explanation of our findings. 

On the other hand post mortem examinations of old people dying of 
accidents has shown conclusively that degeneration, infection and even 
malignancy may exist without obvious signs or symptoms during life. 
We cannot exclude the possibility that this type of latent disease gives 
rise to the increased E.S. R. of apparently healthy old people, nor is 
evidence available as to its incidence or severity in the “ fittest ” indivi- 
duals as compared to the remainder. Unpublished data (Renbourn) 
show little relationship between E.S. R. and a subjective assessment of 
“ fitness ” in young adults. 

However the difference in the range of “ fitness ” between a group of 
old people and one of young adults is probably considerable. If we can 
assume that the A’s are comparable to the young groups in our data, 
we have a more or less steady increase in mean E.S. R. and standard 
deviation reaching a mean of 16.0 mm. in the oldest group. 

Since the E. 8. R. is commonly used in the diagnosis of various dis- 
eases clearly age must be taken into account. 

Whitby and Hynes (1935) and Biggs and McMillan (1947-8) have 
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shown that there is a relatively wide range in osmotic fragility of the 
R. B.C. of normal adults. This is due partly to variation between 
individuals but largely to technical errors. 

Schlomka and Burger (1927) reported an increase in osmotic fragility 
of the R. B.C. in old people and more recently Olbrich (1947) in a 
careful study came to the same conclusion. We find that the difference 
between old and young groups lies almost entirely in commencing hae- 
molysis and that it is significant. The osmotic fragility of the R. B. C. 
seems more heterogeneous in old age. Newman and Gitlow (1943) 
found no evidence for a sex difference. 

There is abundant data concerning errors of the W. B. C. count and 
on the normal level in young adults. Short term variation has been 
investigated by Shaw (1927), and seasonal variation by Engelbreth-Holm 
and Videbaek (1949). 

In old age, Miller (1939), Fowler et al (1941), Newman and Gitlow 
(1943), and ourselves have found levels which do not appear to be 
appreciably different from those of young adults. Aaltonen (1939) found 
that the W. B. C. count falls in old age. However the reported data do 
not seem to have been tested for statistical significance. Gabbert et al 
(1947) found no difference in the means of young and old hospitalised 
patients and concluded that the response to infection was similar. 

There is controversy as to sex difference in the W. B. C. count in old 
age. Newman and Gitlow (1943) described a significant difference but 
this was not confirmed by Olbrich (1947). 

The reports available in the literature concerning serum potassium in 
old age (Cahane 1927, Lucchi 1931, Parhon and Werner 1932, Benetato 
and Ciurdariu 1939) suggest a rise above accepted levels, but confirmation 
of this unexpected finding is necessary. 

Dahl (1950) found little change in blood magnesium and no evidence 
of a sex difference in old age. Ornstein and Vascauteano (1934) found 
no changes in serum sodium from infancy to old age, and our few data 
show no changes in whole blood chloride. There is controversy in the 
literature as to variation in serum calcium and phosphate but the changes 
reported are small. Shock and Yiengst (1950) found no significant age 
variation in the diurnal change in blood pH. 

A good standard method for the estimation of blood cholesterol is 
still lacking. Although the early data of Bloor and Knudson (1917) 
and Gardner and Gainsborough (1927) gave for adults a range of about 
100-220 mgs%, recent work (Boyd 1942, Peters and Man 1943) shows 
a much greater variation, with a range of about 80-360 mgs. 
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As early as 1844 Becquerel and Rodier claimed that cholesterol was 
increased in the blood of old people, and this has been apparently con- 
firmed by Parhon and Parhon (1923), Bing and Hekscher (1924), 
Lespagnol et al (1938), Aaltonen (1939), and Rafsky and Newman 
(1941-2). Unfortunately in some of the reports the number of cases is 
small, the clinical normality of the subjects open to doubt, and com- 
parable control groups are not included. Recent data in which such 
control groups are available show no evidence for an increase of blood 
cholesterol with ageing. In a careful study with data subjected to 
variance analysis, Page et al (1935) found no significant change. Kountz 
et al (1945) came to a similar conclusion but described a fall after the 
age of 80 years. The data of Peters and Man (1943) show no evidence 
for either age or sex variation. Our own figures appear to be within the 
range expected in young adults. Landé and Sperry (1936) failed to 
detect any relationship in man between blood cholesterol and athero- 
sclerosis. 

The literature dealing with the concentration of nonprotein nitrogen 
constituents of the blood in old age is already voluminous. Hoffstater 
et al (1950), found no evidence for a change in concentration of amino 
acids of the blood with ageing. Little or no age variation in blood uric 
acid appears to exist, but a sex difference is described ( Bréchnez-Mor- 
tensen 1938, Bulger and Johns 1941, Praetorius 1951). 

Huffman (1939) found no change in nonprotein nitrogen concen- 
tration in the blood of old people, but both Aaltonen (1939) and Olbrich 
(1948) described a rise above normal levels. 

It is generally assumed that the range of blood urea in young healthy 
adults is 20-40 mgs% and the maximum in the young group of our data 
is 41 mgs. However McKay and McKay (1927) showed that a figure of 
50 mgs. lies within the normal range and also demonstrated a small 
diurnal variation. Pucher et al (1934) reported a seasonal variation. 

An increase in blood urea with ageing was described as early as 1918 
by Rappleye and confirmed more recently by Laroche et al (1933), 
Brodin et al (1937), Lespagnol et al (1938), Musser and Phillips (1929- 
30), and Davies (1949). Lewis and Alving (1938) examined 103 pre- 
sumably healthy men aged 40 to 101 and found a gradual rise in mean 
blood urea and a fall in urea clearance with advancing age. Olbrich 
(1950) and McDonald et al (1951) have recently described a decrease 
in both glomerular filtration and renal blood flow in healthy old people 
and stress the importance of renal vasoconstriction as a causative factor. 
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Our data show that a blood urea of over 70 mgs. is uncommon in 
old people but 125 mgs. (reduced in a second test to 60 mgs.) occurs 
occasionally (Table 1). 

Repeated tests on old people show that the blood urea is much more 
variable than in young adults and may change from below 40 mgs. to 
over 60 mgs. between two successive occasions. We have found the age 
variation in blood urea to be barely significant (P ~5%) but the 
standard deviation appears to rise progressively with age from the young 
group onwards. The interpretation of this variation is not clear. Pos- 
sibly higher figures than are found in young adults are pathological and 
the increased variation with age is due to a progressive increase in 
incidence of kidney disease. — 

A urea concentration test was carried out (and repeated at least once) 
in 40 individuals of 70 to 90 years of age and in 17/40 figures consis- 
tently below 2.0 gms% were obtained. However no relationship between 
blood urea level, urinary findings (albumin, casts) and results of urea 
concentration test were found. The urine specific gravity was normal 
but a trace of albumin and the presence of a few casts were not 
uncomon. In most cases examined the 24hour urine volume was 
appreciably less than that of young adults and nycturia was common. 
No association of blood urea level with “ fitness ” rating was found. 

Although prostatic hypertrophy increases with advancing age there 
was no evidence of urinary obstruction in any of our cases. Furthermore 
our data did now show a sex difference either in blood urea or in urea 
concentration ability. 

The kidney tissue is known to shrink progressively throughout life 
and the change is usually well marked in old age. However it would 
not appear likely that the total loss of tissue in the senile kidney is 
sufficient per se to explain the decrease of renal function described above. 
Although there is no doubt that chronic renal failure occurs at all ages, 
the usual clinical features (diastolic hypertension, retinal changes, cardiac 
hypertrophy; polyuria and isosthenuria; phosphate retention and aci- 
daemia) are not characteristic of healthy old age. 

Sodium or chloride deficiency are often associated with a rise in 
blood urea (extrarenal azotaemia), but as pointed out above these do 
not appear characteristic of old age. It is well known that absorption of 
urea occurs in the renal tubules, but the preliminary work of Davies 
(1949) suggests that this in fact diminished in old age. 

There is evidence that azotaemia with a negative nitrogen balance 
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(among various other metabolic changes) may follow stimulation or 
disturbance of the hypothalamus, frontal cortex, or elsewhere in the 
cerebrum (Lewy and Gassman 1935, Allott 1939, Sweet et al 1948, 
McLardy 1950). Disturbance of the anterior pituitary gland in animals 
results in a fall in glomerular filtration, and renal blood flow (White et al 
1942), and clinical pituitary disorders may be associated with a fluc- 
tuating azotaemia, a lowered urea clearance, a delayed excretion of water 
and oliguria (Beaumont and Robinson 1943, Miller 1946). Although 
the changes in endocrine function have not yet been elucidated, there is 
some evidence that old age is associated with a decrease in the eosinophil 
cells and an increase in the chromophobe and vacuolated basiphil cells of 
the anterior pituitary gland (Rasmussen 1936, Ecker 1941). 

These findings suggest that the changes in renal function in old age 
may not be entirely due to a primary deficiency of kidney tissues, and it 
may be necessary to consider the possibility of an altered neuro-endocrinal 
control of renal function or renal blood flow. Such a change in the 
central control mechanisms may play a part in the change of bone meta- 
bolism (osteoporosis with a normal blood calcium and phosphate), protein 
metabolism (negative nitrogen balance, variation in plasma proteins, 
increased E.S. R., azotaemia), water metabolism (cellular dehydration, 
oliguria, nycturia) and sugar metabolism (hyperglycaemia without gly- 
cosuria) which seem fairly characteristic of growing old. The efficiency 
of homeostasis would appear to decrease with advancing age. 

There is still conflicting opinion not only as to the effect of ageing on 
blood pressure but also on the limits in young adults. Although many 
texts give the upper limit in young adults as 140/90 mm. a number of 
observers find that figures of 150/95 mm. or even higher may occur in 
otherwise healthy individuals (Wiggers 1949, Heath 1945, Flaxman 1945). 

The earliest observations suggested that the systolic blood pressure is 
equal to the age plus 100 expressed as mm. Hg, with little or no change 
in the diastolic pressure. More recently Janeway (1913) and Robinson 
and Brucer (1939) conclude that neither the systolic or diastolic pressure 
alter in the higher age groups and figures over 140/90 mm. represent 
hypertension at any age. On the other hand Bowes (1916-17), Lewis 
(1938), Miller (1941), Russek et al (1946) and Howell (1947), claimed 
that the systolic and pulse pressure increased with age, with relatively 
little alteration in diastolic pressure. Robinson and Brucer (1939) after 
excluding all figures over 140/90 mm. came to the conclusion that there 
was no change in blood pressure with age, but as pointed out by Treloar 


a# 


a Oo @ 


BLOOD CHANGES IN OLD AGE 81 


(1940) such bias may provide results which are simply those desired or 
expected. 

In our study no attempt has been made to eliminate any data or to 
assign arbitrary limits to the normal range of values. It is possible that 
a number of individuals included in the data were in fact cases of 
symptomless hypertension and clearly their exclusion would have pro- 
duced lower mean values. 

Our data show a rise in mean systolic and pulse pressure to the region 
of 160 mm. and 70 mm. respectively at about the age of 70, after which 
a fall in both occurs. The mean diastolic pressure would appear to 
increase slightly to about 85-90 mm. at the age of 70, remain steady to 
about the age of 90 and then fall. In spite of our few data, the general 
trend in the means corresponds to the earlier work of Bowes (1916-17) 
and the recent data of Russek et al (1946). 

It is generally accepted that the general trend of blood pressure in 
old age is related to the decreased elasticity of the large arteries (Rem- 
ington et al 1948), but not necessarily with the incidence of athero- 
sclerosis (Page 1945). 

There is little available literature on sex differences in blood pressure 
(Alvarez 1923, Symonds 1923 and Shock 1944) found no evidence 
for a difference before puberty. In adult life it is generally accepted that 
the levels are higher for men and Alvarez claimed that after the age of 
40 the sex difference becomes smaller. Symonds found a small difference 
with higher figures in women after 40 years of age. Examination of our 
data fails to detect a significant sex variation in either systolic or diastolic 
pressure (Tables 11 and 12). 

Although both E.S.R. and systolic pressure increase significantly 
with age, the correlation between them in our data is negligible 
(r = — 0.044, not sig.). Olbrich (1950) found the decreased renal func- 
tion of old age to be more marked in the presence of a raised diastolic 
pressure. In this connection the correlation in our data between blood 
urea and diastolic pressure (r = 0.233, P~ 5%) is of some interest. 
We have found no obvious association between “ fitness” rating and 
either systolic or diastolic pressure. 

In the past preoccupation with mean values has hidden from view 
the variability inherent in all biological data. Frequency distributions 
undoubtedly change with increasing age and an unexpected observation 
in an elderly individual is not necessarily abnormal. We have shown 
(Table 14) that using acceptable adult levels of “normality” for 5 
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different tests, only 5/61 individuals over the age of 70 in whom all 
tests were carried out, can be regarded as normal. 

The change of variate with age is not a line drawn through the means, 
but a band whose limits are not well defined. We have shown that for 
some forms of data the scatter becomes wider with increasing age. 

This appears true not only for many types of physiological and 
clinical measurements but also for psychometric and psychomotor per- 
formance (Simonson et al 1941, Misiak 1947, Shock 1951). However 
a decreased scatter in old age seems to be characteristic of urinary excre- 
tion of ketosteroids (Wooster 1943, Hamilton and Hamilton 1948). We 
have demonstrated that extreme old age may be reached with little or 
no change in any particular measurements (Table 1). 

The heterogeneity of an ageing population is due to a complex of 
interacting factors. The variation in form and function seen in early 
life will presumably continue. Superimposed upon it is the variation in 
rate of ageing between individuals of the same age group and between 
tissues and organs of the same individual. Symptomless disorders which 
undoubtedly occur in young life increase rapidly in incidence and inten- 
sity in the later decades when the range of variation between individuals 
probably becomes larger. Since those smitten by fatal illness are removed 
by death, the later decades may be continuously improving some qualities 
for survival. Although Dorn (1950) and Harris et al (1950) have 
recently discussed the techniques of “ follow up” studies in data where 
the only items of interest is recurrence or non-recurrence of disease, as 
far as we are aware there has been no corresponding study of numerical 
data of the type we have considered. 

We have shown that in old people variation from adult levels is found 
in some blood constituents and in blood pressure. Certain sex differences 
apparent in younger life diminish and may even disappear. 

A great deal of knowledge has accrued from the study of lower forms 
of life, but in order to unravel the complex mechanisms of ageing in man, 
and to define more clearly the criteria necessary for the sifting of the 
physiological from the pathological, long term studies commencing in 
infancy and continued to the last decades are urgently required. The 
life force is inherent in the germ plasma. Knowledge is required on 
how this force is spent. 


RESUME 


1. A study was made on 140 presumably healthy men and women over 
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60 years of age and data presented from younger groups. Statistical 
methods were used in the evaluation of the results. 
2. The haematocrit value shows no evidence for anaemia and the mean 
value is not significantly different from that of a young group. 
3. The E.S. R. is increased with ageing but the change is appreciably 
smaller in individuals who appear fittest by a subjective rating. 
4. The osmotic fragility of the R. B. C. is somewhat greater in elderly 
individuals. 
5. In old people the W. B. C. count appears to correspond to that seen 
in young adults. 
6. The blood cholesterol and whole blood chloride do not appear to 
change with age. 
7. Blood urea concentration is somewhat raised and urea concentrating 
ability lowered in old people. 
8. The mean systolic and pulse pressure show some increase with age, 
with little change in diastolic pressure. 
9. No evidence for a sex difference is found in haematocrit, E. 8. R., 
blood urea or blood pressure. 
10. The literature is reviewed in the light of the findings. 
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Erratum pertaining to the paper Unity of Nature, by Robert E. Bass, 
in the December, 1951 issue of Human Biology. The sentence beginning 
on line 13, p. 323 should read: And our breathing supplies the green 
plants with part of the carbon dioxide they need for their growth. 
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ESTIMATION OF THE T-YEAR SURVIVAL RATE 
FROM FOLLOW-UP STUDIES OVER A 
LIMITED PERIOD OF TIME * 


BY ARTHUR 8. LITTELL 
School of Hygiene and Public Health, The Johns Hopkins University ** 


INTRODUCTION 


HE PURPOSE of this investigation is to compare the actuarial 
and the maximum likelihood methods of estimating T-year sur- 
vival rates from follow-up studies in which the individuals are observed 
for varying lengths of time, and in which the length of observation is 
limited to a calendar period of time. 

A follow-up study is one in which individuals of a group which is 
exposed to some common experience (e. g. admittance to a clinic, opera- 
tion, etc.) are traced at subsequent times to determine their status in 
regard to a specific non-repetitive event (e. g. onset of a particular disease, 
death, etc.). Since it is frequently the focus of interest in such studies, 
the event will hereafter be referred to as death. The results of a follow- 
up study can usually be most efficiently described by summarizing the 
mortality experience of the group into a statistic which can be compared 
with a similarly derived statistic of another group. In studies over a 
limited period of time the statistic most frequently used for this purpose 
is a 7-year survival rate, or the probability that an individual who is 
exposed to the experience is alive 7 years later. 


* Presented December 28, 1951, at a meeting of the American Statistical Asso- 
ciation in Boston. Paper No. 283 of the Department of Biostatistics, School of 
Hygiene and Public Health, The Johns Hopkins University. The investigation 
reported here was supported in part by the Milbank Memorial Fund. The writer 
is indebted to Professor William G. Cochran for the basic plan and guidance of 
this investigation. and to Doctor Margaret Merrell for helpful suggestions and 
criticisms throughout this investigation. 

** Now at the School of Medicine, Western Reserve University. 
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Although the actuarial method does not denote a standardized pro- 
cedure, the principal feature of its use in estimating the T-year survival 
rate is division of the total period of observation into sub-intervals and 
estimation for each sub-interval of the probability that an individual 
survives. The estimate of the probability that an individual survives 
the total period is the product of the estimates of the probabilities that 
he survives each of the sub-intervals. One of the common procedures 
of application of the actuarial method has been described by Berkson 
and Gage (1). 

Throughout this investigation the (7-year) survival rate to be esti- 
mated is that for the total length of the study (T years). 


THE ACTUARIAL METHOD 


The application of the actuarial method used in this investigation is 
illustrated in Fig. 1. Fig. la shows three individuals who entered the 
study on different dates. Individuals (i) and (ii) died before the end 
of the study, but individual (iii) survived until the closing date. The 
sub-intervals to be used for analysis are indicated on the horizontal scale 
at the foot of the diagram. 


a b 
(4). 
(444) 
Chronological Time 
entered study 
Periods for Analysis © withdrawn alive 


Fig. 1. ILLUSTRATION OF CLASSIFICATION OF INDIVIDUALS. 


Fig. 1b shows how these individuals contribute to the analysis 
recorded in Table 1. All three individuals were in the study for at 
least two sub-intervals or periods. Individuals (ii) and (iii) wer. also 
present for the entire third period, but individual (i) died during his 
third period in the study. If individual (i) had not died, he would 
have withdrawn before the end of the third period due to the closing 
of the study, and for this reason he is classified as both a death and a 
withdrawal during the third period of the study (cols. 3 and 5 of Table 1). 
(Throughout this investigation the assumption is made that no individual 
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is “lost sight of ” between his entering date and the closing date of the 
study.) Individual (ii) died in the fourth period, but if he had not 
died he would have been observed throughout the fourth period; he is 
therefore classified as a death in the fourth period (col. 4 of Table 1). 
Individual (iii) is classified as a withdrawal during the fifth period 
(col. 3 of Table 1). 

In general terms, the total period of observation is divided into a sub- 
intervals of equal? length 7/a. An individual who entered the study 
T; years before the closing date and who is alive on the closing date is 
considered to have withdrawn in the [a7;/7']-th* sub-interval. He is 
considered as an individual exposed to the risk of dying in each sub- 
interval prior to the [a7,/T']-th, and not exposed in any of the subsequent 
sub-intervals. In the [a7;/T]-th sub-interval he is exposed to the risk 
of dying, but if he does die, the death might not occur before the close 
of the study; therefore he is considered as “ effectively ” a fraction of 
an individual exposed to (observable) death in the sub-interval. If the 
individual dies in the z-th sub-interval, where x < [a7,/T], he is exposed 
to the risk of dying in all sub-intervals up to, and including, the z-th, 
but not in any subsequent sub-interval. 

This classification of individuals differs from that of Berkson end 
Gage in that they consider only live withdrawals. Here an individual 
is considered a withdrawal during a period if the closing of the study 
precludes his being observable throughout the period whether or not 
he dies. 

In each sub-interval the probability of dying (col. 7%, Table 1) is 
estimated as the number oi deaths (col. 4 + col. 5) divided by the “ effec- 
tive number of individuals exposed ” (col. 6) in that sub-interval. Thus 
if O, start the z-th sub-interval, d, die and w, withdraw, the estimate of 
the probability of dying is 

d, 
0, — cw, 


where (O,—cw,) is the “effective number of individuals exposed.” 
The factor c is sometimes taken as 14 (as in col. 6, Table 1); it will 
be shown that this may be a poor choice. 


1In a study in which there are no individuals “lost sight of,’ Dorn (3) 
recommends that the withdrawals be dropped from the analysis at the end of the 
period preceding withdrawal. Some information is thereby sacrificed. 

* The sub-intervals need not be of equal length, but in practice are usually 
taken so. 

* [y] represents the largest integer smaller than y. 
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In Table 1 column 9 gives the cumulative survival rates. The prob- 
ability that an individual alive at the start of the study is alive at the 
end of period 4, for example, is the product of the probabilities of sur- 
viving each of periods 1, 2, 3, and 4 (column 8): i.e. (1. 0)(1. 0)(.6)(.5) 


= .3. 


PREVIEW 


In the subsequent treatment a constant force of mortality will be 
considered because it is the simplest type of theoretical mortality curve 
to treat. Even so, the constant force of mortality may not be a bad 
approximation to the actual force in many applications for Irwin (7) 
has shown that it is difficult to distinguish between an exponential dis- 
tribution law (constant force of mortality) and a lognormal one unless 
data are available extending for a considerable distance on either side 
of the mean. Also, Boag (2) has shown that the distribution of survival 
times following commencement of treatment of cancer patients (who 
subsequently die with cancer present) can be approximated by a lognormal 
distribution. 

In addition it will be assumed that the times of entry of individuals 
into the study are spread out evenly over 7’ years, where T7' is the total 
period covered by the study ; that is, the times of entry follow a rectangu- 
lar frequency distribution. The rectangular distribution is a reasonable 
approximation to actual situations in which the event that brings an 
individual into the study (e.g. an operation) is fairly uniformly dis- 
tributed throughout time. 

In the subsequent sections the following results are obtained: 

(1) The maximum likelihood estimator of the T-year survival rate 

and its variance are derived. 

(2) The actuarial estimator with c = 14 is shown to be biased. The 

bias is independent of sample size, but depends on the actual 
force of mortality as measured by the 7-year survival rate, P, 
and on the number of sub-intervals, a, in the actuarial analysis. 
The bias increases as P decreases, and decreases as a increases. 

(3) The factor c, which is needed to give an approximately unbiased 

estimate of P, is determined. 

(4) Other approximations for ¢ are developed. 

(5) Greenwood’s approximation to the variance is discussed and a 

rate form of variance is considered. 
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(6) The bias and variance of the actuarial estimator with c = 1% are 
computed for selected values of P and a, and the variance is 
compared with that of the maximum likelihood estimator. 


MODEL 


Let n individuals be observed over intervals 7; = T. Let the lengths 
of observation times follow the rectangular frequency distribution: 


dF (T;) = dT,/T (0,T), te 
Let each of the individuals be exposed to a constant force of mortality. 
That is, if an individual has survived to time ¢, the probability that 


death occurs in the subsequent time Af? is AA?, where A is a constant. 
Then the probability that an individual survives to time ¢ is 


lim (1 — At/m)™ = e*; 


the probability that an individual dies before time ¢ is 
1—e; 

and the probability that an individual dies between times ¢ and ¢ + dt is 
Ae>*dt. 


Of the n individuals let d die after being observed for intervals 4, 
t=1,---+,d; each & being less than the corresponding 7;; and let 
s = n— survive. 


Mazimum Likelihood Solution 


The contribution of an individual who dies at time ¢; to the likelihood 
of the sample is Ae‘, and the contribution of an individual who sur- 
vives his period of observation is e~\7*, so the likelihood of the sample is 


4 


Differentiating the logarithm with respect to A, we obtain 


— (E+ 34) +2 


The maximum likelihood estimator of A is 
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To obtain the variance 


2 
1 (2) 


The probability that the i-th individual dies is 1 — e~7*, and the expected 
number of deaths is 


E(d) = 
Var(d) = E[D 


Thus 
E(d) —e™+)}dT, dT,: - dT, 
in-1 T 
nf (1— e*) dz 
1— 
=n(1— Vi ). (3) 
Similarly, 
Var(d) ==, (1—e?)?. (4) 
Equation (2) becomes 
wr? 
or 
(5b) 
ime) 


The maximum likelihood estimator of the T-year survival rate and its 
variance are obtained by the following large sample aproximation : 


from which 
Var(P) = (— Te")? Var(A) = or 
P log P|? 
[P log 5 (6a) 
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and 
AT — Plog P 


op = = = 
Vn +25) 


(6b) 


ACTUARIAL METHOD EXPECTED VALUES 


Let n individuals, whose observation times follow a rectangular 
frequency distribution, be exposed to a constant force of mortality with 
the 7-year survival rate P= e 7, In an a period analysis, if there 
were no deaths, n/a would be expected to withdraw in each period and 
(a— 2 -+ 1)n/a would be expected to be under observation at the start 
of the z-th period. These numbers multiplied by the probability of 
surviving all previous periods give the expected numbers starting and 
withdrawing in the z-th period: 


E(0,) = ne-(2-)AT/a (7) 


E(w,) = = e-(2-)\T/a, (8) 


The expected number of deaths among those observed throughout 
the period is obtained by multiplying the expected number observed 
throughout the period by the probability of dying in the period. That is 


E (dz) = [E(Oz) — E(wz)](1 —e"*) 
= (1—2/a) ne~(2-1)AT/a (1 — eT/s) (9) 
The expected number of deaths among the withdrawals is the expected 
number of withdrawals multiplied by the probability that an individual 


who withdraws in a period dies before he withdraws. That probability 
is (from equation (3), putting 7’/a in place of T): 


e-\T/a 
AT/a 
Thus 
1— 
= E(w.) (1 — 


1— 


a AT /a ). 


(10) 


} 
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The expected probability of dying in a period is computed as 


E (dz) + E[(wd).] 
(92) = —cE (we) 


Substituting the above values and taking c= 1% 


1—erT/a 


(a — a) (1 — +1 — 


E(qz) =“ 


(q—z+1—4) 


(a—z) (1— + 1—a/aT (1 — 


and 
2a AT/ -\T/a 
*) +2(a—z)e 1 


2(a—z)+1 


E(pe) =1— E(qz) = 


Bias of the Actuarial Method with c—% 


Bias is the difference between the expected value of an estimator and 
the value of the parameter which is being estimated. The bias in the 
actuarial method of estimating the 7-year survival rate when c is taken 
as 14 depends upon the force of mortality and, as will be shown, upon 
the value of the 7-year survival rate and the number of sub-intervals in 
the analysis. 

In an a period analysis where the force of mortality is constant and 
c is taken as 14, the expected value of the probability of survival in 
the z-th period is (11) 

2a 
— (1 — e>?/*) 4 2(a—z)e*74—1 
B(pe) = 
2(a—z)+1 
and the bias is 


2a 
— e-ATh 
1+e 
2(a—z)+1 


It can be shown that if pes41, E(pz) < e>", because actually 


c< \% (see Fig. 2). 
The biases of the actuarial method for analyses of one, two, three, 


E(pz) = (12) 
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and four periods have been computed for P, the 7-year survival rate, 
varying from .1 to .9 in tenths, where the bias was taken as 


B=e*T E(pz). 
a 
These biases are presented in Table 2. 
TABLE 2 


Bias (True P — E[P)) of actuarial method for 1, 2, 3, and 4-period first 
approximation and 1-period quadratic approximation 


FIRST APPROXIMATION QUADRATIC 
APPROXIMATION * 
1-period 2-period 3-period 4-period 1-period 
9 0018 0006 .0003 .0002 — .0007 
8 0074 0023 .0012 .0007 — .0023 
7 0178 0054 .0027 .0016 — .0041 
6 0339 0099 .0048 .0029 — .0051 
5 0573 0158 .0076 .0046 — .0042 
4 0233 .O111 . 0066 . 0003 
3 1372 0323 0150 .0088 0109 
2 2059 0419 .0189 .0109 0326 
1 3183 0486 .0212 0120 0788 
* Used in a later discussion. 


The table of biases shows 

(i) For a given number of periods in the analysis, the bias increases 
as the 7-year survival rate decreases. 

(ii) For a given T-year survival rate, the bias decreases as the 
number of periods in the analysis increases. 


The biases are independent of the sample size, and for a large 
enough sample the ratio of the bias to the standard error of the estimate 
is large. This ratio can be decreased by using a greater number of 
periods in the analysis, but this method, as will be shown, also increases 
the standard error of the estimate. Another way in which this can be 
corrected for to a large extent is by a different choice of the factor c. 


Effective Number Exposed 


If the number of deaths, d, say, which occur among those who with- 
draw after they withdraw, in a period, were known, then the probabiity 
of dying in a period would be 


E(d + dw)/O 
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Hence a value of c which formally gives an unbiased estimate of the 
probability of dying in the period is given by * 


E(d) _ E(d)+dy 
O—cw O (13) 


From equation (3), 


E(d) =(0—w)(1—e™) + 
E(d) + de —O(1—e™). 
Thus 
O{w(1—e™) 
w0(1—e™*) 
1 
(14a) 
4 
(14b) 
The estimator of q is then 
d d 
O—cw Ate (15) 
1—e*¥ 


This formally gives an unbiased estimate, but ¢ involves unknowns, 
and hence in practice the unbiasedness is not rigorous. 


APPROXIMATIONS TO THE “ EFFECTIVE NUMBER EXPOSED ” 


Furthermore, equation (15) involves g implicitly, so approximations 
must be used in order to obtain an estimate of g. A simple approxi- 
mation is to substitute for ¢ a linear function of g. Fig. 2 shows the 
graph of c against q, and it is seen that the straight line c = 1/2 — q/8 
is a fairly good approximation to ¢ throughout the range 0 Sq S.7. 


AT e-\T 
Using the linear approximation to c, 


d 
48) 
“Where d now represents d, + (wd), of Table 1. 


4 
4 
| 
| 
| 
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solving for q 


_ —(0 —w/2)+V(0— + wd? 


q w/4 
Since qg cannot be negative, the positive root must be taken. Thus 
V (0 —w/2)* + wd/2 —(0 — w/2) 
q aM (16) 
and this will be called the “ quadratic approximation ” to q. 
0 
40k tes ~ 
~ 
20 
: 


Fic. 2. GRAPH OF THE Factor C AGAINST THE 
PROBABILITY OF DyING, 1 — e-AT. 


The bias of the quadratic approximation has been computed for a 
one-period analysis and for P varying from .1 to .9 by tenths. This is 
presented in Table 2 with the biases of the first approximation discussed 
earlier. The bias of the quadratic approximation compares favorably 
with that of the two-period first approximation except for P —.1. 


6) 


is 
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Another method of approximation is to expand (16) in a series and 
to take the first few terms of the series. 


Vi-+ ac = (1+ ar)*—=—1 +4 
Rewriting (16), 


O—w/2 w/2 d 
w/4 
w/2 w/2 d 


w/2 d 2 
(ofan) 
Thus (16) in series becomes 


Taking the first term of the series as a first approximation 


O—w/2, w2 __ 4d 
O—w/2 O—w/ (19) 


As a second approximation 


w 
wv) (18) 
The first approximation is equivalent to taking c=, and it is 


interesting to note that the first approximation can be obtained directly 
from the series expansion of c. From equation (14a), 


1 At 1 1 1 
1 
aly At — 790 ] 


and oa the first term 
c=, 


This same series expansion is used by Harris et al (6) in arguing 
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| | 
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that c = 14 may be regarded as a first approximation no matter what the 
force of mortality is. 

When gq (c=) is used as the first approximation, the necessity 
of using the second approximation in analysing most data is rare, for, 
as the total period is divided into sub-intervals so that the force of 
mortality in each sub-interval may be regarded as constant, g, in each 
sub-interval will be decreased, and in the range 0 < gq, = .6, the second 
approximation gives an overcorrection. Furthermore, in all sub-intervals 
except the last, 

We 
8(0,— w,/2) 


is less than 10%, and in the last where wa = Og, 


Wa 
8 (Oa — W,/2) 
If, as occasionally happens, the first approximation of qq is higher than 
.7, it is best to take the factor c for succeeding approximations directly 
from the graph of ¢ against qz. 


= 25%. 


VARIANCE 


The estimate of the T-year survival rate is P II pz, where pz is 
the survival rate in the z-th sub-interval. Greenwood (5) proved that 
the exact variance of P is: 


Var(pz) 
Var (P) = E(P)? 1 19 
(P) = + — (19) 
He also proposed that if the Var (p,) are small so that cross products 
may be ignored, then the variance is approximately: 


Var (P) E(P)* 
and the sample estimate 
Var (P) =P? (20) 


This expression will be referred to as “ Greenwood’s approximation.” 
There has not yet been set forth an entirely satisfactory form for 

the variance of p,. The form used most generally in actuarial analyses 

is the binomial form in which d, is regarded as a binomial observation 


4a 
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from (O,—cw,) trials with probability gz. (In this form c is usually 
taken to be 14). This form is not entirely satisfactory because the deaths 
observed among the withdrawn group may exceed (1—c) wz. i 

In order to avoid this difficulty, d,, the deaths in a period, may ; 
be regarded as composed of two parts: d,, the deaths among those 4 
observed throughout the sub-interval; and d,, the deaths among those P 
who withdraw during the sub-interval. Then 4 


d, d, dy. 


Again d, will be regarded as the random variable, and 0, — cw, will ‘| 
be regarded as having no variation; so that | 


Var (pz) = Var (d,)/(O. — cwz)?. 


Now d, and d, are independent variables, so 


Var (d,) = Var (d.) + Var (dp) 


Var (d,) + Var 
(O.— (21) 


Var (pz) = 


Expression (21) will be referred to as the “ rate form ” of variance of pz. 


The variances, rate and binomial forms, have been computed for 
actuarial analyses of one, two, three, and four periods for P varying 4 
from .1 to .9 in tenths. These variances are presented in Table 3. For i 
these variances Greenwood’s approximation to the variance of a product | 
(20) was used. Also E(O,) — E(w,)/2 was used as the denominator 
in both the rate and binomial forms. The actual formulae used for 
obtaining Var (p,) are: 


Binomial form: 


— 


substituting terms from equations (7), (8) and (11) 4 


Var (ps) = 


Var (pz) = 


4a[2a(1—e-*?/*) /AT + 2(a—z)e*7/*—1] + 1 


(22) 


9s69° I681° 289° 9882 e199" 6900'— 0902" 

(dq) & (dq) IBA (dq) ABA IBA & 
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yayy sof (q)40Au pur 
‘spowiad-f pun ‘g ‘g ‘7 sof ‘susof pormourg pun ‘(q)40 AU puv (dq) 


102 


ESTIMATION OF SURVIVAL RATE 103 
Rate form: 


d, is a binomial observation with p = e>7/« 

Var (d,) = — wz) (1 — 

from equation (4), putting w, for n and T/a for T 

Var (dp) = aw,(1 — e*7/*)?/2,T 

substituting these and equations (7) and (8) into equation (21) 


(a — x)(1 — eT /*)e-AT/a/q 4 (1 — 7/2)? /2)T 
[(2(a— 2) + 1)/2a]?ne“@ 


Table 3 shows for any P, nVar (P).q increases as the number of sub- 
intervals in the analysis, a, increases. (For low P’s and a going from 
one to two the bias is too large for the approximation c = 1% to be used. 
It can be shown mathematically that this increase in variance always 
holds for constant force of mortality and rectangular distribution of 
lengths of observation times.) This observation was at first startling 
because it seemed that in increasing a (dividing the total period into 
more sub-intervals) the amount of information being used in the 
actuarial estimate was approaching that used in the maximum likeli- 
hood estimate. On the other hand, in combining the estimates of the 
actuarial sub-intervals, equal weights are given to the estimates of each 
sub-interval although the estimates of the later sub-intervals are less 
reliable. As a@ is increased the variance of the estimate in the last sub- 
interval becomes large enough to raise the variance of the estimate of P. 
That this is more a property of division into sub-intervals than a property 
of the method of estimation is illustrated by the fact that if P = .6, 
and the data in two-period forms is analysed by the maximum likelihood 
method in each of the two periods, the variance is higher than for the 
maximum likelihood method applied to one period. 

An approximate value for the variance of the quadratic estimator 
can be obtained in the following manner. 


V (0 —w/2)? + wd/2 —(0 — w/2) 


Var (pr) = 


(23) 


= 
Var (q) = (dq/dd)? Var (d) 
dq/dd = : 


V (0 —w/2)? + wd/2 


Var (q) = on [(O — w)pq — wq?/2 log p] 


4 

| 
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Using E(d) in the denominator, 


a (O — w)pq — wq?/2 log p 
Var (9) — + 10 — oe + + 
(24) 


EFFECT OF BIAS AND VARIANCE ON SAMPLE SIZE 


It has been shown that in the actuarial method with c — 1, the bias 
and variance of the estimate of P, the T-year survival rate, vary with 
P and with a, the number of sub-intervals in the analysis. The variance 
also depends on the sample size, n. Table 4 indicates how the ratio of 


TABLE 4 


The maximum sample size for the biases of the actuarial estimators to be less 
than 10% of the standard error 


FIRST APPROXIMATION QUADRATIC 
APPROXIMATION 
1-period 2-period 3-period 4-period 1-period 
9 587 7775 38 414 63 890 453 
sS 59 806 3 645 10 580 545 
7 13 203 960 2 899 228 
6 70 346 1 058 169 
5 29 148 464 267 
4 69 224 56 830 
3 34 113 37 
2 58 3 
1 27 


the bias to the standard deviation, P, a, and n are related. The entries 
in the table are the largest sample sizes, by number of sub-intervals and 
T-year survival rate, for which the bias is less than 10% of the standard 
error, binomial form. For instance, for P —.7 and a= 2, the bias is 
.005403 and the binomial variance is .59347/n. In order for the bias 
to be be less than 10% of the standard error 


.005403 
V .59347/n 
n < 203. 


Table 4 shows, for instance, that if P — .9, the one-period actuarial 
first approximation estimator can be used for sample sizes up to about 
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580 with negligible bias. If two or more periods are used in the 
analysis, the bias is negligible for any reasonable sample size. If P = .6 
and the sample size is 200, at least a three-period analysis must be used 
for the bias to be negligible. 

The one-period quadratic approximation should, strictly speaking, 
be comp: zed only with the one-period first approximation; however, 
for P=.%, the former does look better than the two-period first 
approximation. 


“ ASYMPTOTIC EFFICIENCIES ” 


As has been shown, the estimators based on the actuarial method are 
inconsistent ; i. e., the bias of the actuarial estimators does not go to zero 
as the sample size becomes infinite. Therefore, asymptotic efficiencies 
cannot strictly be defined. However, if the ratios of the variances are 
computed as if the biases did not exist, some idea of the comparative 
properties of the estimators can be obtained. 

The ratios of the variance of the maximum likelihood method to the 
variances of the actuarial method, 1, 2, 3, and 4-period first approxi- 
mation and 1-period quadratic approximation are presented in Table 5. 


TABLE 5 
Ratio of maximum likelihood variance to actuarial variance, 
“asymptotic efficiencies’ where bias may be ignored 


FIRST APPROXIMATION QUADRATIC 
APPROXIMATION 
P=e%T  1-period 2-period 3-period 4-period 1-period 
rate bin. rate bin. rate bin. rate bin. 
9 .93 .97 .71 .72 .62 .87 .57 1.03 
8 .86 .93 .67 .69 .59 .60 .54 .54 1.04 
7 .62 .66 .55 .57 .50 .52 1.02 
.57 .63 .51 .53 .47 .48 .99 
.50 .42 .45 .93 
.41 .46 .38 .40 .84 
3 .32 .36 
+ .26 .30 


These ratios or “ asymptotic efficiencies” are presented only for those 
values of P and a for which the maximum sample size for the bias to 
be negligible is greater than 50, as presented in Table 4. 

The “asymptotic efficiencies ” decline as the number of periods in 
the analysis, first approximation, increases. The efficiency of the quad- 
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ratic estimator is high for P > .5, but declines rapidly for P < .4; that 
is, the efficiency is high in essentially the same range in which the bias 
is low. It should be remembered, however, that the quadratic estimator 
presented above was designed to reduce the bias in the range .8 << P= 1.0 
(cf. Fig. 2), and other quadratic approximations (linear approximations 
to the factor c) will be better for more restricted or different ranges of 
the T-year survival rate. 


SAMPLES 


The theory and comparisons developed so far compare the “ asymp- 
totic” properties of the various. estimators, i.e., the properties of the 
estimators as the sample size becomes large, ignoring the bias. Upper 
limits of sample size for which the bias may be ignored (bias less than 
10% of the standard error) have been computed (Table 4). 

Determination of the properties of the estimators in small samples 
is arduous and unrewarding. Therefore, in the present investigation 
random samples were drawn to check the behaviour of the estimators 
in small samples. 

Forty samples each of size fifty were drawn from tables of random 
numbers (4). The samples were drawn from a population in which 
the force of mortality is constant, the T-year survival rate is P —.6, 
and the lengths of observation times of individuals follow a rectangular 
frequency distribution. The samples were drawn in two sets of twenty 
with a slight difference in sampling procedure between the two sets. 


FIRST SET OF SAMPLES 


In the first set of twenty samples (Table 6), random numbers were 
drawn to determine 7;, the length of the time the i-th individual was 
observed. The probability that the i-th individual dies is the binomial 
probability 1— e7*. For each individual another random number was 
drawn and compared with his probability of dying; if the random number 
was the smaller he was considered a death, otherwise he withdrew alive. 
For those who died a third random number was drawn and converted 
into the time to death , ¢,, in the following manner: 


The distribution of time to death, ¢,, given that an individual died 
and could have been observed over time 7; is 


f(t: | dt, . (0, T;) 


q 
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The random number RN 


follows the rectangular frequency distribution (0,1). Thus 


log.-(1 — RN(1—e>*)}. 


SECOND SET OF SAMPLES 
For the second set of samples it was desired to expedite the sampling 
procedure. Instead of sampling for Tj, a distribution of times of 
observation was assumed. 
TABLE 7 
Assumed distribution of times of observation for second set of samples 


SAMPLES HAVING 


ONLY TWO INDI- A=.05108 

T: VIDUALS WITH AT Pi=eT 

SPECIFIED 7; . 

.29 #17 0150 985 015 
.88 #16 0451 956 044 
1.47 #15 0751 928 072 
2.06 #14 . 1052 900 100 
2.65 #13 . 1352 .873 127 
3.24 #20, #11 . 1653 . 848 152 
3.82 . 1953 . 823 177 
4 41 #19, #10 . 2254 .798 202 
5.00 . 2554 .775 225 
5.59 2855 . 752 248 
6.18 #18, #7 .3155 .729 271 
6.76 . 3456 .708 292 
7.35 #5 . 3756 . 687 313 
7.94 #4 .4057 . 667 333 
8.53 #3 .4357 647 353 
9.12 #2 . 4658 628 372 
9.71 #1 .4958 609 391 


The interval (0,7) was divided into seventeen equal sub-intervals, 
Table 7, and for each sample three 7;’s were taken at the mid-point of 
each of these sub-intervals. In each sample one 7; was dropped to make 
the sample size fifty. 

By assuming a set of 7;’s instead of sampling for them, the time 
consumed in drawing the samples was reduced by 1/3, and some steps 
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in the further analysis were simplified; furthermore, there seems to be 
no practical reason for not regarding the two sets of samples as com- 
pletely comparable. 


ANALYSIS OF SAMPLES 


All samples were analysed by the following methods: maximum 
likelihood, maximum likelihood applied to each of two actuarial sub- 
intervals, actuarial first approximation for 1, 2, 3, 4, and 8 sub-intervals, 
and the actuarial quadratic approximation for 1 and 2 sub-intervals. 
The estimated survival rates are presented in Table 8 by sample and 
by method analysed. 

If P; are the sample estimates, Table 8, for a method of estimation, 
then the mean of the (observed) distribution of estimates is =P;/40. 
The difference between this mean and .6, the true value of the 7-year 
survival rate, is an estimate of the bias of this method of estimation. 
The standard deviation of the estimates about the true mean, .6 


V 3( Pi — -6)?/40 


is an estimate of the standard error of this method of estimation for 
samples of size 50. These values, for each method of estimation, are 
presented in Table 9 along with the expected standard errors, rate and 
binomial forms. 

The 40 samples of 50 each were combined into 20 random samples 
of 100 each, and estimates of the 7-year survival rate were made by 
the maximum likelihood method and by the actuarial first approximation 
method using 1, 2, 3, 4, and 8 sub-intervals. The means and standard 
deviations of the observed distributions of estimates and the expected 
standard errors are presented in Table 10. 


CONCLUSIONS 
The following conclusions are suggested by Table 9: 


(1) The one-period actuarial first approximation estimator is less 
efficient than the maximum likelihood estimator. The standard 
error of this actuarial estimator about the true mean, .6, is 
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TABLE 8 
Estimates of P 
SAM- MAX. LIKE. ACTUARIAL IsT APPROX. QUADRATIC 
PLE 
NO l-per. 2-per. l-per. 2-per. 3-per. 4-per. 8-per l-per. 2-per. 
1 .7051 6967 68 .6974 7078 7515 .7504 .7022 .7047 
2 .4986 4219 40 = .3841 5012 5265 .5338 .4702 .4212 
3 .4672 3881 36 3977 4694 4881 .5139 4388  .4274 
4 7016 60 .6995 7324 7229 .7274 .6336 .7047 
5 .7301 6979 76 . 6631 6460 7401 .7187 .7728 
6 .7117 6828 72 #.6570 6608 6130 .6784 .7372 .6694 
7 5758 6075 52 .6019 .6012 5710 .3590 .5668 .6114 
8 5696 5358 52 .5320 =. 5408 6194 .6292 .5668 .5504 
9 5796 5639 .52 .5213 .5666 5218 .5983 .5668 .5409 
10 .5193 5286 .44 .4947 .5270 4431 .4914 .5020  .5201 
11 . 6601 6732 ~=.60 6391 .7262 7035 .7275 .6336 .6484 
12 .6731 6786 =. 64 6401 .6389 6348 .6821 .6676 .6508 
13 .6042 .5726 .60 6119 .6650 6421 .6254 .6336 .6237 
14 . 6692 7205 64 7152 ~=.6450 6371 .7074 .6676 .7203 
15 .4462 .4671 .40 4923 .2943 5808 .5313 .4702 .5090 
16 .8096 .8734 .80 8649 .8348 8468 .8435 .8092 .8662 
17 .5661 .6522 .48 6264 .5997 5750 .4193 .5342 .6344 
18 .6607 .7010 .64 7130 §=.7062 7112 = 7175 
19 .6131 .52 6127 .6656 6390 .6622 .5668 .6226 
20 .4927 .52 4500 .5313 4547 .5547 .5668 .4788 
21 .5822 .7088 .56 6987 .6899 7090 .7204 .6000 .7069 
22 .5839 .6479 .56 6352 .5812 4613 .6575 .6000  .6441 
23 .5554 .5396 .52 5158 .4997 3602 .0000 .5668 .5361 
24 .5827 .3994 .56 3135 .5246 2862 .4474 .6000 .3778 
25 5461 .5655 .52 5648 .7291 6261 .6575 .5668 .5784 
26 .5494 .5289 .52 5023 .5397 5048 .6626 .5668 .5242 
27 .6440 .5666 .60 4840 .4491 3031 .3448 .6336 .5164 
28 .6395 .6022 .60 5656 .6831 6494 .6605 .6336 .5838 
29 .5220 .5135 .48 5110 =. 5635 4939 .5612 .5342 .5300 
30 .5334 .56 5020 =.5178 4929 .2250 .6000 .5289 
31 .7305 .6566 .72 6369 .7058 6123 .4028 .7372 .6537 
32 .7687 .8440 .76 8380 .8065 8341 .8203 .7728 .8372 
33 .6059 .6083 .56 5980 .6798 6583 .6670 .6000 .6102 
34 .5289 .5430 .48 5160 .5763 .5803 .5342 .5330 
35 .56257.  .5232 =. 44 4689 .4546 4414 .4843 .5020  .4911 
36 .5621 .4552 .48 3885 .5345 4674 .5125 .5342 .4341 
37 .5839 .7079 .56 7030 =.7024 7130 =. 7206 .7109 
38 .7695 .7391 .76 7209 .6840 6821 .7319 .7728 .7293 
39 .5371 .5732 .48 5469 ~=.4791 6168 5795 .5342 .5610 
40 .5898 .7065 .56 7030 =.7180 7165 7145 .6000 ~.7109 
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TABLE 10 
Summary of distributions of sample estimates—20 samples, size 100 


MAXIMUM ACTUARIAL 
LIKELIHOOD FIRST APPROXIMATION 


1-per. l-per. 2-per. 3-per. 4-per. 8-per. ) 


Mean P . 60342 .566 .58684 .60066 .58882 .59512 

Standard deviation about .6 .0490 .0738 .0707 .0617 .0812 .1120 
(20 degrees of freedom) 

Expected standard devia- .0658 .0791 .0868 .0923 .0963 .1368 
tion, rate form 

Expected standard devia- .0701 .0831 .0900 .0947 .1049 
tion, binominal form 


-1095 compared with .0836 for maximum likelihood. The rela- 
tive efficiency is 


.0836 7] ? 
1095 


The expected relative efficiency is 
binomial 88% 
rate form 69%. 


— 58%. 


(2) The one-period actuarial quadratic approximation estimator 
appears to be as good as the maximum likelihood estimator. 
The relative efficiency is 


.0881 


whereas the expected relative efficiency is 


.09317) ? 
.0935 


(3) The standard error of estimate increases as the 7-year period 
is divided into sub-intervals for analysis, no matter which 
estimator is used. Some of the differences between pairs of 
observed standard errors were tested for significance (at the 
5% level) by the Pitman test (8) (a t-test for correlated 
observations), Table 11. The maximum likelihood standard 
error is significantly lower than all others with the exceptions 


= 90% ; 


= 99%. 


or 
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of the 3-period first approximation (which was not tested 
because it is practically the same as the 1-period first approxi- 
mation) and of the 1-period quadratic approximation (which 
was tested and the difference was found not significant). 


) TABLE 11 


jini (5%) of differences between pairs of observed 
rd deviations of the iupthione of sample 
eg mo of the T-year survival rate 


Max. Ist Quad. 
Like. Approx. Approx. 
No. of 
periods 1 2/1 2 3 4 8j;1 2 
Max. 1 Ff ft = #¥ 
Like. 2 r aA 
1 
2 
First 
3 
Approx 
4 
8 
pprox. 2 


r—Indicates that the standard deviation of the analysis 
to the left (row) is si cantly lower than that of the 
analysis above (column). 


c—Indicates that the standard deviation of the analysis 
above (column) is significantly lower than that of A 
analysis to the left (row). 


- Indicates that the difference is not significant. 


The differences corresponding to the blank spaces above 
the diagonal were not tested. 


The 1-period quadratic approximation was also tested against 
1-period first approximation and 2-period quadratic approxi- 
mation and found significantly lower than each of the latter. 


(4) The evidence presented is insufficient to indicate whether the 
rate form or the binomiul form of variance is the better approxi- 
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mation to the true variance of the actuarial first approximation 
estimator. 


It is also suggested by Table 10 that the standard error of the esti- 
mate increases as more sub-intervals are used in the analysis. 


THE EFFECT OF DIVISION INTO SUB-INTERVALS IN THE 
ACTUARIAL METHOD 


In reviewing the literature no clear statement was found of the 
original purpose of division of the period of observation into sub-intervals. 
The present investigation indicates that in a study in which the indivi- 
duals are observed for varying lengths of time some of the principal 
accomplishments of division into sub-intervals are: 


(1) If the over-all force of mortality is not constant, it can be 
approximated by a sequence of constant forces in successive 
sub-intervals better than by one constant force over the total 
period. 


(2) If any individuals are “lost sight of,” the length of time during 
which they are observed is fixed more accurately in the analysis 
if the total period is subdivided. 


(3) If the force of mortality is constant, the bias which is introduced 
in the estimation of the “ effective number exposed ” is reduced ; 
on the other hand, the variance of the actuarial estimate of the 
T-year survival rate is increased by subdivision. This indicates 
that division into sub-intervals should be avoided. 


DISCUSSION 


It should be pointed out that this analysis presents an extreme case 
in that the survival rate to be estimated is that for the total period of 
observation, i.e. everybody “withdraws.” In practice observations are 
frequently made over a longer period of time than indicated in the 
(T-year) survival rate. This extremity, however, brings out the differ- 
ences between the actuarial and the maximum likelihood methods and 
the effects of increasing the number of sub-intervals. 

One natural continuation of this investigation is to consider different 
schedules of withdrawal. For example, one might take no entries after 


VV VS 
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a certain date prior to the date of analysis; this would ensure that over 
a part of the study there would be no withdrawals. 

Another natural continuation is to consider non-constant forces of 
mortality, and find how the bias and variance of the actuarial and the 
maximum likelihood (and perhaps a multiple-period maximum likeli- 
hood) methods change. 

SUMMARY 


The purpose of this investigation is to determine the properties of 
the actuarial method of estimating T-year survival rates relative to the 
maximum likelihood method in follow-up studies. The type of follow-up 
study considered here is that in which (a) the individuals are observed 
for a limited period of time; (b) the lengths of times of observation of 
the individuals follow a rectangular frequency distribution; and (c) the 
force of mortality to which the individuals are exposed is constant. 

For such studies the following results are established: 


(1) The actuarial first approximation estimator (taking as the “ effec- 
tive number exposed ” the number starting a period minus half 
the number who withdraw during that period) gives a biased 
estimate of the T-year survival rate. The bias is independent 
of the sample size, increases as the 7-year survival rate decreases, 
and decreases as the number of sub-intervals in the analysis 
increases. 


(2) The variance of an estimator of the 7-year survival rate increases 
as the number of sub-intervals in the analysis increases. 


(3) An actuarial estimator is proposed which reduces the bias found 
in the first approximation estimator and which, for a one-period 
analysis, has a high “asymptotic efficiency ” for P= .5. 


(4) The bias and variance and largest sample size for which the bias 
is unimportant relative to the variance have been computed for 
selected values of the T-year survival rate and selected numbers 
of periods of analysis for the actuarial first and quadratic 
approximation estimators. 


(5) Random samples were drawn for studies with 50 and 100 indivi- 
duals. The sampling results agree with the above theoretical 
findings that the variance increases with the number of periods 
in the analysis. 
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(6) An estimate of the variance of the 7-year survival rate different 
from the binomial form frequently used in practice was proposed, 
This new estimate seems theoretically preferable to the binomial 
form. On the basis of the samples, however, neither could be 
chosen as better. 


This investigation indicates that for some follow-up studies the 
actuarial method has serious drawbacks if the estimator is not chosen 
carefully. The actuarial quadratic approximation estimator which was 
proposed avoids these drawbacks in these stuides for certain ranges of 
the T-year survival rate. 

At the present stage of this investigation, it seems feasible to say 
that for most forces of mortality a simple estimator based on the actuarial 
method can be devised which gives estimates of low bias and low variance 
in a one-period analysis. Since such estimators are available, it appears 
that division of the total period of observation into a large number of 
sub-intervals is inefficient for many forces of mortality. 
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METHODOLOGY IN THE STUDY OF PHYSICAL 
MEASUREMENTS OF SCHOOL CHILDREN 


PART II. 


SEXUAL MATURATION—DETERMINATION OF 
IMMATURITY POINTS 


BY A. HUGHES BRYAN AND B. G. GREENBERG 


School of Public Health, 
University of North Carolina, 
Chapel Hill 


1. INTRODUCTION 


N A previous study (1) the authors were concerned with an evalua- 

tion of methods for the analysis of anthropometric measurements of 
school children in order to ascertain the scope of differences between 
children from varying socio-economic backgrounds. This was the first 
step in a broad study of the factors, including those related to nutrition, 
which might account for observed differences (1,2) in the body size of 
children from varying socio-economic groups. 

Consideration was given in Part I to the optimal age for study, to 
the numbers of children necessary to show specified weight and height 
differences, and to which physical measurements were the most sensitive 
discriminators. It was recognized, however, that studies involving the 
anthropometry of school children should not be divorced from an inves- 
tigation of the stage of sexual maturity of the subjects measured. This 
is true since puberty, which occurs in different children at different 
chronological ages, is associated with a well-defined acceleration and 
deceleration in growth. Furthermore, it appears well established that 
children destined to mature early are taller and heavier than those who 
show a late puberty, this difference becoming evident some years before 
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sexual differentiation takes place. Hence a study of growth differences 
in children from different socio-economic backgrounds should include, 
in addition to an analysis of anthropometric measurements such as was 
presented in Part I, a study to determine whether or not the groups 
compared differ significantly in their ages of attainment of puberty. 

In a longitudinal study published in 1931, H. G. Richey (3) divided 
school girls into three groups in respect to the age of first menstruation: 
those whose menarche occurred before the thirteenth birthday; those 
with menarche between the thirteenth and fourteenth birthday ; and those 
with menarche after they had attained fourteen. A comparison of these 
three groups showed that at any chronological age from six to sixteen 
there was a tendency for girls belonging to an early maturing group to 
be heavier than members of a group characterized by a later menarche. 
The findings in respect to the group mean heights at various ages were 
similar except that the maturity groups ceased to differ at the end of 
the fifteenth year—* a fact that has led different investigators to report 
both a tendency and the absence of a tendency for taller girls to reach 
puberty earlier than it is reached by shorter members of the same sex.” 
In 1937, Richey (4) published his findings on girls in greater detail, 
together with similar results from a longitudinal study of the growth of 
boys divided into maturity groups on the basis of the age at appearance 
of axillary hair. In another longitudinal study, Boas (5) observed in 
girls and boys the timing of the adolescent spurt in stature, a phenomenon 
related to puberty. He reported that “during the period of growth the 
average stature for a given year is the less, the later the maximum rate 
of growth,” and a similar statement could be made with respect to stature 
during the growing years and the age at menarche. In both boys and 
girls, adult stature was not much affected by an early or late adolescent 
spurt. A comprehensive monograph on growth patterns in girls in 
relation to the timing of the menarche was published in 1937 by Shuttle- 
worth (6), and the principal findings of Richey, Boas, and Shuttleworth 
have been uniformly confirmed by quite a number of more recent 
investigators. 

Since the magnitude of the stature and weight differences between 
groups of boys and girls classified according to early or late puberty 
may be of interest, some of the findings of two cross-sectional studies 
are presented briefly. Ellis (7) reported that boys who had attained 
pubescence at the time of examination (presence of pubic hair and/or 
pubescent genital development) and were aged 13.1 to 14.0 or 14.1 to 
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15.0 were found to be 5.2 to 12.4 cm. taller and 11.4 to 20.7 lbs. heavier 
than their classmates of the same age who were less mature (prepub- 
escent). Stone and Barker (8) observed a weight difference of 14 to 
19 lbs. in girls 12.1 to 14.5 years old between the group that had and 
the group that had not attained menarche at the time of study. The 
stature difference between the two maturity groups diminished with age 
and varied from 6.8 cm. for girls 12.1 to 12.5 years old to 3.5cm. for 
girls 14.1 to 14.5 years old. 

The primary purpose of this paper is to describe a method, suitable 
for cross-sectional studies, for the determination of whether or not two 
samples of boys or of girls differ significantly in their group mean age 
of sexual differentiation. For longitudinal data the average age of 
attaining puberty is most efficiently estimated by the arithmetic mean 
of the chronological ages at which the physical attributes of puberty 
appear or reach a certain stage of development. With regard to cross- 
sectional observations, Hogben, Waterhouse and Hogben (9) have shown 
that the statistical methods of biological assay are immediately trans- 
ferable to this problem. These authors determined the average ages at 
which both boys and girls displayed certain physical changes associated 
with puberty, the sequence of the occurrence of these changes, and their 
individual durations. The physical changes referred to include, among 
others, growth of pubic hair, axillary hair, development of the breast, 
and the onset of menstruation. Hogben and co-workers used probits 
and the logistic curve to calculate 30, 50 and 70 per cent immaturity 
points for each of the above physical changes. (The p per cent im- 
maturity point is the age at which p per cent of the children in a 
homogeneous group are immature and 100-p per cent have begun to 
show changes associated with puberty; e. g., the age at which 50 per cent 
of boys have no pubic hair and 50 per cent have begun to develop pubic 
hair.) These authors computed in a similar manner the ages at which 
30, 50 and 70 per cent of the sample were mature, or adult, in respect 
to a physical characteristic and the remainder not yet mature. In con- 
tinuing the type of inquiry started by Hogben et al., we have also used 
probits and logits to determine 30, 50 and 70 per cent immaturity points 
for some of the physical changes associated with puberty. We have found 
that under certain circumstances a simpler statistical technic, Karber’s 
method, will provide almost the same information with less laborious 
computations. This shortening of the necessary calculations should make 
the methods here described more generally acceptable. 
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2. SUBJECTS AND METHODS OF EXAMINATION 


To obtain data for use in the methodological investigation reported in 
this paper, we studied children living in an orphanage. The observation 
and recording of the physical changes associated with puberty were done 
by a qualified physician in the course of a routine physical examination 
of the children and without their being aware that special attention was 
being devoted to these matters. Measurements of height and weight 
were carried out at the same time by experienced technicians. In all, 
three series of observations were made: in April and May, 1950, 86 boys 
and 101 girls were examined and measured; in October and November, 
1950, 86 boys and 90 girls were studied; and again in April and May, 
1951, we collected data on 81 boys and 90 girls. For the first series of 
examinations (April-May, 1950), all children then resident in the 
orphanage and aged 9.0 to 14.1 years were included in the study, the 
only exception being a boy who had received gonadotrophic hormone for 
an undescended testicle. In the second series of observations (fall, 1950), 
all children then living in the orphanage and within the above age limits 
were included, with the exception that two of the older girls refused 
examination and four were excused. Of these six, all but two were 
sexually mature at the time of the previous examination. This difficulty 
in securing the cooperation of adolescent girls is familiar to many inves- 
tigators. In the third series of observations (April-May, 1951), only 
those children who had been studied the previous fall were included. 
In this short semi-longitudinal study we have collected data on 72 girls 
and 72 boys observed at three successive six-month intervals; for the 
purpose of this paper, however, we are analyzing the data as three cross- 
sectional observations, including all the children examined and measured 
in a given series. 

All measurements were made in the mid-morning on children clad 
only in an examining robe, and after they had been requested to void. 
Stature was determined on a Broca plane by a series of independent 
measurements by two observers, and to obviate errors in determining 
weight the scale was read independently by two observers. Chronological 
age in months at the time of the examination was calculated from the 
child’s date of birth, verified from official records in cases where there 
was some doubt. ' 

For the purpose of judging attained sexual maturity in girls, observa- 
tions were made by a physician in the course of a routine medical 
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examination. The stage of development of the breast, pubic hair and 
axillary hair were recorded, and it was ascertained by questioning the 
girl, and sometimes the matron, whether or not the first menstruation 
had occurred. Ratings of the development of the breast and of pubic 
hair were made on the five-point scales described and illustrated by 
Reynolds and Wines (10). Ratings of the development of axillary hair 
were made on a four-point scale described for use with boys by Ellis (7). 
In the population studied, shaving and the use of depilatories did not 
complicate the study of axillary hair. 

In judging attained sexual maturity in boys, pubic hair was rated 
using the six-point scale and axillary hair using the four-point scale of 
Ellis (7) and no difficulty was experienced in classifying boys in regard 
to those attributes. The situation was quite different in respect to 
maturity ratings of boys based on the growth and development of the 
external genitalia. Greulich et al. (11) have described and pictured a 
number of stages in the growth and development of the penis, testes, 
and scrotum which were of assistance to them, in combination with 
changes in pubic, facial and axillary hair, in assigning boys to five 
maturity groups. Schonfeld (12) has made careful measurements of the 
penis and testes of maturing boys and presents growth curves for both 
organs from birth to maturity. For obvious psychological and social 
reasons direct measurements of the genitals usually cannot be employed 
in population studies. Photographs of the nude child have been used in 
longitudinal studies of growth and sexual maturation by Reynolds and 
Wines, Greulich et al., Stolz and Stolz (13), and others. Examination 
and comparison of a series of such pictures of an individual child enables 
the investigator to determine the age at which certain genital changes 
took place. We propose, however, to carry out cross-sectional studies ; 
and we regard it as imperative that the rating of the genital development 
of adolescent boys and girls be a part, and not a conspicuous part, of a 
routine physical examination. In rating boys for genital development 
our physician was impressed with the continuous nature of the growth 
and morphological changes in the male generative organs. After a year’s 
experience he adopted a simple three-point classification based largely 
on the size of the penis and recorded as immature, pubescent, and 
adolescent. An arbitrary size of the penis of approximately 5.5 cm. in 
length and 1.5cm. in width, as judged by the eye in comparison with 
an object of that size, was used to classify boys as immature or pubescent 
in respect to genital development. 
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Some information is available regarding the consistency of the 
examining physician in judging and recording the physical changes 
associated with puberty. Maturity ratings were assigned to 72 boys at 
each of three examinations at six-month intervals, and to 75 girls at each 
of two examinations at a six-month interval. In every case the examiner 
was not aware of the rating he had previously assigned to the child, 
and it is possible to count the number of times he judged a child as less 
mature in regard to a given characteristic in a subsequent examination 
than his rating at an earlier examination would warrant. In all, approxi- 
mately 144 such comparisons between examinations are available for 
each attribute in boys, and approximately 75 in girls. Only three 
instances occurred of an apparent loss of maturity due to inconsistency 
on the part of the examiner, one in classifying a girl by breast develop- 
ment, and two in classifying boys by genital development. 


3. STATISTICAL METHODS * 


The raw data consisted of chronological ages, heights, weights, and 
the ratings of sexual maturity based on the examinations of boys and 
girls as described in the previous section. The observations selected for 
statistical analysis in this paper were those obtained on 101 girls in the 
spring of 1950 and on 86 boys in the fall of 1950. For each sex the 
children were grouped into five age classes: those who were 9, 10, 11, 12, 
13 years old at their last birthday. Data on two boys and four girls 
who had passed their thirteenth birthday were not included in the 
calculations. Although age intervals of one-half year might appear to 
be more desirable in some instances, the moderately small numbers of 
children in this investigation prohibited the use of such small age classes. 
As a matter of fact, the use of smaller age classes (e. g. half-year instead 
of one year) greatly increases the labor of computation, usually makes 
no difference in the immaturity point obtained, and has a variable effect 
on the standard error of estimate. A more detailed consideration of 
this problem will be found in the appendix of this paper. 

As described in Section 2, the physical changes in the reproductive 
organs and pubic and axillary hair were recorded in the original data 
using various three-to-six-point systems to indicate ascending maturity 
from the infantile to the adult status. However, three stages were readily 


? The statistical part of this investigation was supported by a research grant 
from the National Institutes of Health, U. S. Public Health Service. 
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recognized in the development of each of these physical characteristics: 
(a) the prepuberal or infantile stage; (b) a transitional stage charac- 
terized by incomplete metamorphosis; and (c) the adult stage of com- 
plete sexual differentiation. An exception is menarche for which the 
transitional stage b is absent unless the irregularity of periods charac- 
teristic of many adolescent girls is regarded as a manifestation of tran- 
sition from immature to adult. Data on this point were not included 
in the present study. 

For the purpose of this investigation, the end points selected for 
study were the ages at which 30, 50 and 70 per cent of the population 
were infantile (stage a) and 70, 50, or 30 per cent non-infantile (stage 
b or c) in respect to each physical attribute. To obtain these end points, 
the percentage of boys and girls still immature in respect to each attribute 
was calculated for each age class. The p per cent immaturity point was 
then estimated using, in the case of the girls, probits, logits and Karber’s 
method, and in the case of the boys Karber’s method alone. Good 
descriptions of Karber’s method may be found in the articles by Corn- 
field and Mantel (14) and by Irwin and Cheeseman (15). 

Hogben, Waterhouse and Hogben (9) used both probits and the 
logistic curve to estimate the p per cent age for a specified level of 
development. These authors expressed a preference for the logistic curve. 
It appears, however, that their rationale for the rejection of the method 
of probits applies equally well to the logistic procedure. In fact, any 
symmetric sigmoidal curve would be equally objectionable if the reasons 
they advanced were valid ones. A more detailed explanation of their 
choice of the logistic curve was obtained by correspondence with one of 
the authors (16). The choice was based upon results obtained in 
calculations from their data rather than upon theoretical considerations. 

The logistic curve fitting procedure employed by the computers in 
the paper of Hogben et al. did not use a weighting scheme as advocated 
by Berkson (17) to obtain approximately the minimum ,? fit, nor did it 
include the 0 and 100 per cent values. These two percentages, although 
often less reliably determined, should not be excluded in the curve fitting 
process. In designing an experiment to determine p per cent maturation 
points associated with puberal changes in children, it is best to avoid these 
terminal points, if possible. As in other applications of the methods 
of bio-assay, one should strive to obtain a distribution of observations 
such that the percentages in the terminal groups are immediately 
within the limits, 0 and 100 per cent, and there is a concentration of 
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observations in the central region. If, however, 0 and 100 per cent 
values occur in the observational data under analysis, they should not 
be discarded. In the weighting process associated with the method of 
probits the effect of these terminal values upon the regression line is 
diminished while the opposite is true of observations near the center of 
the distribution. 

Tukey (18) has recently indicated that the differences between probits 
and logits are slight, if not negligible, in most applications. For example, 
in order to distinguish between the tails of the two curves at the one-half 
per cent point, at least 1600 experimental units would be required. This 
number rises quickly so that at the 15 per cent point the required number 
of subjects or animals would be 7000. Such numbers of experimental 
units appear to be far in excess of those to be expected in studies of 
growth and sexual maturity in humans. Hogben et al. reported on 662 
girls and 642 boys; the statistical work in this paper is based on observa- 
tions on 97 girls and 84 boys. 

With certain limitations, to be detailed later in this paper, we advocate 
the use of Karber’s method since it has the distinct advantage that it 
requires less laborious calculations, and within these limitations, produces 
satisfactory results. Other workers have favored Karber’s method, 
especially when a rapid approximation is necessary, and provided that 
certain assumptions are fulfilled. One of the more important of the 
latter appears warranted in this instance, namely, that the two terminal 
groups have 0 and 100 per cent response respectively. Irwin (19) indi- 
cated a preference for Karber’s method. Bross (20), who referred to it 
as the Spearman-Karber method, confirmed its desirability as a simple, 
rapidly calculated technic and indicated that it provided excellent esti- 
mates for small samples. Cornfield and Mantel (14) similarly expressed 
the advantages found in Karber’s method and showed that it provided 
the maximum likelihood estimate when certain conditions were fulfilled. 

In evaluating which computational procedure to employ in a given 
instance, several considerations are pertinent. These are given in order 
of relative importance: 


1) Correspondence of the actual underlying distribution with the 
mathematical model, i. e., is the latter sufficiently close to the truth 
for all practical applications ? 

2) The method of estimating the parameters (usually two) in the 
procedure. Is it based upon maximum likelihood, least squares, 
minimum x’, or approximate minimum ,?? 

3) Computational convenience. 
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In biological assay problems, Bliss (21) has advocated the integrated 
normal curve because of the first consideration above. On the other hand, 
Berkson (17) has argued that the logistic curve has theoretical backing 
because of certain “ physicochemical phenomena” involved. For the 
analysis of puberal changes, there appears, at present, no theoretical 
reason to prefer one form or another of the sigmoid curve. One needs 
to rely, therefore, on common sense and empirical observations for the 
verification of any proposed scheme. 

Concerning the second point, method of estimating the parameters, 
the previously mentioned article by Irwin indicated that the differences 
between minimum y? and maximum likelihood were negligible in bio- 
assay problems. A similar situation appears to be true in the present 
field of research.’ 

As regards the third point, ease of computation, Karber’s method 
possesses marked advantages over the methods of probits and logits. 


4. RESULTS 


The results of our methodological study are to be found in Tables 1 
to 5. Each table presents the 30, 50 and 70 per cent immaturity points 
for each physical characteristic for the group of girls or boys under 
consideration, together with the data from which these end points were 
caleulated—the per cent of children in each age class still immature 
in regard to the characteristic studied. Standard errors of estimate are 
recorded in the tables for the 50 per cent end points calculated by 
Karber’s method and the large sample standard errors of estimate for 
those obtained by probits and logits. In order to make a comparison 
of the three methods of computing immaturity points, the girls’ end 
points were calculated by all three procedures, and a discussion of the 
results obtained is to be found in the next section of this paper. 

The sequence and timing of the physical changes associated with 
puberty in the samples of girls and boys studied are given in Table 1. 
Thus for girls, utilizing 50 per cent end points computed by Karber’s 
method, half of the population was immature with respect to breast 
development at 10.7 + 0.23 years, with respect to growth of pubic hair 


* Berkson (24), (25) has pointed out differences between the two methods of 
estimation when applied to the logistic function. Both methods are iterative, 
however, and Berkson suggests the approximate minimum x? method proposed 
by himself earlier. 
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at 11.5 + 0.20 years, axillary hair at 12.4 + 0.19 years, and menstruation 
at 13.0 + 0.18 years. All of the time intervals between the 50 per cent 
end points for each physical characteristic are significant and indicate 
that, for our sample of girls, growth of pubic hair begins, on the average, 
about 0.8 year after the first development of the breast; axillary hair 
appears approximately 0.9 year after the first growth of pubic hair; and 
the first menstruation follows the appearance of axillary hair in 0.6 year 
on the average. These findings agree well with results in the literature. 
Thus, in a longitudinal study of 49 girls, Reynolds and Wines (10) 
reported the following mean ages: breast development (bud stage), 
10.8 + 1.1 years; appearance of pubic hair, 11.0+ 1.1 years; and 
menarche, 12.9 + 1.4. Assuming these deviations to be standard devia- 
tions, they should be divided by Vn to make them comparable with the 
standard errors of estimate reported in our tables. Boas (5) similarly 
established the age at first menstruation and the standard deviation as 
13.1 + 1.2 years for Horace Mann School girls and 13.5+ 1.1 for 
Hebrew Orphan Asylum girls. Many other studies of age at menarche 
could be cited from the literature. Using biological assay methods on 
cross-sectional data, Hogben, Waterhouse and Hogben (9) calculated a 
number of 50 per cent immaturity points by the logistic curve including 
the following: breast, 11.0 years; pubic hair, 11.8; axillary hair, 13.2; 
and menarche, 13.6. 

In the case of our sample of boys (Table 1) the 50 per cent imma- 
turity points estimated by Karber’s method indicate that pubescent genital 
development (as defined in Section 2 of this paper) and the first pubic 
hair are practically simultaneous in their median age of appearance, 
13.1 + 0.20 and 13.1 + 0.16 years respectively. This close time rela- 
tionship between these two physical attributes of puberty in boys is to 
be found in the cross-sectional study of Ellis (7) and the comprehensive 
longitudinal study of Stolz and Stolz (13). The latter authors observed 
that in 67 boys the mean age of first rapid growth of the glans penis 
was 12.68 years, the mean age of onset of the puberal spurt in stature 
was 12.76 years, and their Table 80, p. 317, indicates that at the time of 
this first acceleration of growth in stature, 52.24 per cent of their boys 
would have been rated immature in regard to pubic hair by the criteria 
used in our study. Among the 50 per cent immaturity points in boys 
calculated by Hogben et al., the following are of interest for comparison 
with our results: pubic hair, 12.9 years; axillary hair, 14.8 years. Our 
sample of boys was too young to permit a satisfactory estimate of the 
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50 per cent end point for axillary hair, as can be seen from the last 
column in Table 1. Only 7 per cent of the boys had ceased to be 
immature in regard to axillary hair, and to use Karber’s method to 
estimate the immaturity points would not be justified, since it would 
involve the assumption that boys in the next age class would be 0 per 
cent immature—an unlikely event. The axillary hair end points were 
computed by probits, but their approximate character is indicated by 
the large standard error obtained. 

At this point the reader should recall that our primary purpose in 
making this study was to explore methods suitable for determining 
whether or not two samples of boys or of girls drawn from presumably 
different populations differ significantly in their median age of onset of 
the somatic changes characteristic of puberty. In this regard, questions 
immediately arise concerning the discriminating power of the technics 
described in this paper. This problem of the variance of differences in 
p per cent immaturity points was tackled empirically by dividing our 
sample of boys and girls into subgroups in respect to stature and weight, 
and calculating the standard errors of the differences in 50 per cent 
immaturity points. Since we are also interested in a comparison of 
results obtained by the use of probits, logits, and Karber’s method, the 
girls’ end points were obtained by all three methods of computation. 
Our boys and girls were divided into the following subgroups: 


H +,H— subgroups whose heights exceeded (H +), or were less than 
(H —), a standard height for chronological age in months. 
The standard heights used were those obtained in a series of 
measurements of a very large number of American boys and 
girls for garment size (22). 


W+,W— subgroups whose weights exceeded (W +), or were less than 
(W—), a standard weight for chronological age in months. 
Standard weights were obtained from the same source as 
standard heights. 


A,B,C The regression of weight on height and age was calculated for 
our sample of girls, together with the standard error of esti- 
mate. Girls were then divided into the following three sub- 
groups: 

A Those whose observed weight exceeded expected weight 
(from the above regression) plus one-half the standard 
error of estimate. 


B Those whose observed weight fell between the limits: 
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expected weight plus or minus one-half the standard error 
of estimate. 


C Those whose observed weight was less than the lower limit 
of B. 


It will be noted in Tables 2 and 3 that when our example of girls is 
divided into subgroups taller or shorter, heavier or lighter than a standard 
which approximately bisects the sample, the resulting subgroups differ 
significantly in the ages of onset of the physical characteristics of puberty. 
Hogben, Waterhouse and Hogben also observed the earlier appearance 
of sexual maturation in the half-population of girls characterized by 
greater than median height or weight. In the case of our sample, in the 
grouping H +, H— (Table 2), the taller subgroup exhibited breast 
development, pubic hair, axillary hair, and menarche 0.8 to 1.0 year 
earlier than the shorter subgroup. All of the differences in 50 per cent 
immaturity points between groups are significant except that for breast 
development which approaches significance. (Results by Karber’s method 
are here used for comparison and calculation of the standard error of 
the differences.) Similarly the subgroup of heavier girls, W +-, shows 
beginning puberal changes 1.0 to 1.2 years earlier than the lighter sub- 
group, W—. The differences in 50 per cent end points for breast, pubic 
hair, and axillary hair are all significant; menarche is notably later in 
the W — subgroup, but the nature of the distribution of the proportions 
of immature girls in each age class of this subgroup is not favorable 
for computing end points. 

The results obtained in comparing 50 per cent immaturity points 
for our sample of girls subdivided by weight and height yield some 
information regarding the discriminating power of the technics used 
in this study. It would appear that the significance of a difference in 
50 per cent end points of 0.8 to 1.0 year may be established if the number 
of children in each group compared is about 50. 

The results presented in Table 4 suggest that height, rather than 
weight, is the factor most closely associated with sexual differentiation. 
In this table the girls are divided into three subgroups (A, B,C) 
according to whether their weight is greater, about the same, or less 
than would be expected from their age and height. The influence of 
height as a classifying factor is thus removed, and with the result that 
the three subgroups do not differ among themselves significantly in the 
age of onset of the physical changes associated with puberty. The largest 
difference observed, that in menarche between group A and group C, 
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amounts to 0.8 to 1.3 years by the three methods of computation, but is 
not significant. Very little difference is noted in first appearance of 
axillary hair in the three subgroups of girls and pubic hair seems to 
make an earlier appearance in group B than in A or C. 

When our sample of boys was divided into four subgroups, H +, 
H—, W+, W— (Table 5), the differences between 50 per cent imma- 
turity points for genital development and pubic hair were small and 
insignificant. This failure to find that the taller or heavier subgroups 
of boys entered puberty earlier than the shorter or lighter subgroups, 
as was the case with the girls, was not due to a lesser variation in height 
and weight in the sample of boys. The variances of the differences from 
the standards of the heights and weights of our boys and our girls were 
of the same order of magnitude and not significantly different ; heights: 
boys, 38.60, girls, 40.83; weights: boys, 120.23, girls, 143.96. Some 
other explanation must be sought for the different results obtained in 
studying girls and boys. It appears to the authors that a possible cause 
of this discrepancy lies in the fact that the age range of our sample of 
girls, 9.0 to 13.9, included most of the puberal spurt in height and 
weight of at least half of the girls in the sample. The median age of 
menarche for our sample was found to be 13.0 years, and Shuttleworth 
(6) has shown the first menstruation occurs just after the peak of the 
puberal spurt in height and weight. Thus the subgroups characterized 
by greater stature or weight might be expected to contain more girls 
having an early puberty. With the sample of boys, the situation is 
different, since the median age of first development of the genitals and 
pubic hair, 13.1 years, is near the upper limit of the age range of the 
sample, 13.9 years. Stolz and Stolz (13) found that in boys the first 
appearance of pubic hair and genital development marked the beginning 
(not the apex) of accelerated growth in height and weight. Hence, due 
to its age restrictions, our sample of boys did not include many who were 
undergoing the accelerated growth associated with puberty. A sample 
of older boys divided into four subgroups in reference to weight and 
height might have shown differences in the median age of onset of 
puberty similar to those observed in girls. This was, in fact, the finding 
of Hogben, Waterhouse and Hogben with boys ranging in age from 8.5 
to, apparently, 16.9 years. It has been mentioned in the first section of 
this paper that previous workers have shown that boys and girls destined 
to mature early are taller and heavier some years before the onset of 
puberty than those who will show a late maturity. Thus Ellis (7) studied 
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the previous growth curves of boys included in his cross-sectional study 
and found that as far back as 6 years of age the mean heights and 
weights of the early maturing groups were greater than those of the 
late maturing groups. The results presented in Table 5 do not reflect 
this previously reported difference in the prepuberal stature and weight 
of groups of boys characterized by an early or late maturity. 


5. DISCUSSION OF THE THREE METHODS OF 
COMPUTING IMMATURITY POINTS 


A study of the 30, 50 and 70 per cent end points recorded in Table 1 
indicates that all three methods of computation provide almost identical 
results. This close agreement is truly remarkable in view of the fact 
that the three methods of computation are based on different theoretical 
distributions and use different criteria for estimating the two parameters. 
The probit technic, for example, assumes an integrated normal curve and 
employs maximum likelihood ; the logit method assumes a logistic curve 
and, in this case, the parameters were estimated by approximate minimum 
x’; Karber’s method assumes a log logistic distribution for the 50 per cent 
point, and under the conditions specified by Cornfield and Mantel is a 
maximum likelihood estimate. For the determination of 30 and 70 per 
cent immaturity points by Karber’s method, it was necessary to assume 
normality so that the normal deviate for 20 per cent on either side of 
the 50 per cent point (viz., 0.524) could be used in conjunction with ¢. 

Furthermore, it should be noted that the same remarkable similarity 
in results is found in the estimates by the three methods of the standard 
errors of the 50 per cent immaturity points. None of the three calcula- 
tion procedures can be considered more precise than the others in this 
regard since all are based on large sample estimates. No attempt was 
made to use the refinement proposed by Bliss (23) for the standard error 
of estimate by probits when certain restrictions are not fully met. 

Turning now to Table 2, in which the 97 girls ara subgrouped 
according to height, it will be observed again that the three computa- 
tional methods are in general accord in their estimates of the immaturity 
points. However, two instances of lack of agreement are found in this 
table. One of these, which occurs at the 70 per cent immaturity point 
for breast in the H + girls, may be explained by the fact that the 70 
per cent point represents an extrapolation, since the girls in the youngest 
age class were already only 60 per cent immature. In the case of the 
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three end points for axillary hair in H — girls, the immaturity points 
by logits are all about one year younger than those computed by the 
other two methods. This discrepancy can be traced to the gross reversal 
of the observed age class percentages (100, 100, 71, 100, 20), and in- 
directly this reversal can be related to the small age class numbers. It 
would appear to be desirable to have at least 10 subjects in each age 
class if this difficulty is to be avoided. This point is especially important 
if Karber’s method is the only one used since the estimate of the standard 
error by this method is highly erratic and greatly increased when the 
numbers of subjects in the age classes immediately surrounding the 
central area are too small. 

When the girls were subgrouped by weight, Table 3, the general 
agreement between estimates of the immaturity points by the three 
computational procedures is once more confirmed. A reversal of observed 
percentages of subjects immature with respect to axillary hair again 
occurred, this time in the W— subgroup, but did not lead to such a 
serious divergence in end point estimates as occurred in the previous case. 

The distribution of observed age-class percentages of girls immature 
in respect to menarche in the W — subgroup raises the general question 
of methods for dealing with 0 and 100 per cent points when probits and 
logits are used. It will be recalled that in Karber’s method the assump- 
tion is made that the data covers the whole range from 0 to 100 per cent. 
On the other hand, these terminal points are not conveniently handled 
in the other computational methods since they are equivalent to infinite 
probits and logits. To overcome this difficulty, we proceeded as follows: 


A. Probits. The 0 and 100 per cent values were excluded from the 
first provisional fit. After the provisional line had been obtained, the 
calculated probits for these points were used in an iterative procedure. 
It was found that three, and sometimes four, cycles of iteration were 
necessary when the subjects were so distributed that there was an 
average of ten in each age class. In some instances (e. g., W— group 
for menarche) there was only one observed point different from 0 and 
100 per cent and it was not possible to use the probit technic at all. 


B. Logits. Several methods of handling 0 and 100 per cent values 
were tried and our experience was not encouraging with any of them. 
The plan finally adopted was somewhat similar to that used with probits. 
After fitting a line to all the non-terminal percentages, estimated values 
were obtained for the remaining points from the calculated logits. The 
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line was refitted to include these points, but with percentage values half- 
way between the estimated ones and 0 or 100 per cent, whichever the 
case may have been. This method is also iterative, but not as rapidly 
convergent as with probits. In two instances, breast development in 
subgroups A and @ (Table 4), a slight modification was introduced to 
fit the circumstances. 


We have mentioned several times in this paper that we advocate the 
use of Karber’s method for the computation of p per cent immaturity 
points. It has the advantage that it greatly diminishes the labor of 
calculating end points, and we have shown that for data of the sort 
analyzed in this paper, end points estimated by Karber’s method com- 
pare favorably with those obtained by supposedly more precise technics 
such as probits and logits. Indeed, the reality of greater precision to be 
obtained in small samples by these more detailed calculations might be 
questioned, when precision is measured by large sample methods. 

We wish to point out certain restrictions to the use of Karber’s 
method and to do so we select as an unfortunate example the estimation 
of immaturity points for axillary hair in the girls of subgroup B, 
Table 4. If Karber’s method is to be used, the following conditions 
should be fulfilled : 


1. Is the assumption that the terminal age classes contain 0 and 
100 per cent a plausible one? In the case of the axillary hair of the 
girls of subclass B, it is not, since it is not reasonable to assume that 
the next older subclass, those fourteen at their last birthday, would be 
0 per cent immature in this respect. This assumption is inherent in 
Karber’s method, and use of the method led to an under-estimate of the 
immaturity points. 


2. Do the age class percentages decrease in a fairly regular order? 
In the example under discussion there is a gross reversal in observed age 
class percentages, and this reversal occurs near tle 50 per cent point 
where it does the most damage. 


3. Is there a minimum of 10 subjects in each age class, particularly 
in the central region surrounding the 50 per cent point? For the 
example under discussion, again the answer is no. This third condition 
is, of course, closely related to the second condition since gross reversal 
in observed points is most likely to occur when class numbers are small. 
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Finally, in fairness to Karber’s method, it should be stated that 
probits and logits also lack precision when conditions 2 and 3 are not 
fulfilled. This failure accounts for the rather large standard errors of 
estimate by these methods when they were used to obtain the 50 per cent 
end point for axillary hair in girls of subclass B. 


6. SUMMARY AND CONCLUSIONS 


The primary purpose of this investigation was to explore methods, 
suitable for use in cross-sectional studies, for the determination of 
whether or not two samples of boys or of girls differ significantly in 
their median age of onset of the somatic changes characteristic of 
puberty. Such a method would find use in research into the factors 
responsible for the differences in body size of children from varying 
socio-economic backgrounds. It is well established that children undergo 
sexual maturation at different ages, that puberty is associated with 
acceleration followed by deceleration in growth, and that children 
destined to mature early are taller and heavier some years before 
puberty than are those who will show a late maturity. 

Hogben, Waterhouse and Hogben utilized the methods of biological 
assay to determine the p per cent immaturity points for different physical 
attributes of puberty in boys and girls. We have extended their work 
to include a comparison of three methods of computing the immaturity 
points, logits, probits, and Karber’s method, utilizing small samples of 
children. We have shown that Karber’s method, which involves much 
less labor for the computer, yields end points which compare favorably 
with those obtained by the other methods of calculation. Three limita- 
tions to the use of Karber’s method in estimating p per cent immaturity 
points are set forth, and the problem of the effect of the size and number 
of age class groupings on the end point and its standard error of esti- 
mate has been considered. In regard to the discriminating power of the 
technics described in this paper, it has been shown that the significance 
of a difference in 50 per cent immaturity points of 0.8 to 1.0 year may 
be established with samples of girls numbering about 50. 

Incidental to this methodological study, the median age of appear- 
ance of breast development, pubic hair, axillary hair, and menstruation 
has been determined in a sample of 97 girls; and the median age for 
pubescent genital development and pubic hair in a sample of 84 boys. 
The end points so determined are in satisfactory agreement with similar 


| 
if- | 
he 
ly 
in 
to | 
1e 
Ly 
of 
rt 
| 
8 f 
n 
| 
| 
| 
| 
i 
| 
| 


140 A. HUGHES BRYAN AND B. G. GREENBERG 


determinations in the literaturv. We have confirmed the findings of 
others with regard to the appearance of the signs of sexual maturity 
in girls taller or heavier than the median at an earlier age than is the 
case with shorter or lighter girls. 


The authors wish to acknowledge the technical assistance of Celia &. 
Webb, Jean G. Wall, and Mary C. Robertson in collecting and analyzing 
the data on which this paper is based and to express thanks to Robert G. 
Hoffmann for supervising the calculations and to Betty Tatum for the 
endless hours of computation. 


APPENDIX 


The problem of the number and size of age classes. 


In Section 2 of this paper we mentioned the problem of whether 
more numerous age classes with a smaller range of ages in each class 
would be advantageous in computing p per cent immaturity points and 
their standard errors of estimate. The first statement to be made is 
that computational labor increases in almost direct proportion to the 
number of age classes. Insofar as the estimate of the end point is 
concerned, the number and size of age classes usually makes no difference, 
provided, of course, that the number of subjects in one class interval does 
not become so small as to cause erratic fluctuations in the observed 
percentages of those with or without an attribute. 

The standard error of estimate, however, is influenced to a certain 
degree by the size of the class intervals used. 

First, smaller class intervals are desirable when a larger interval is 
divided into two smaller intervals and it results in an equal division of 
the number of subjects between the two smaller classes. This can be seen 
by examining the standard error of estimate associated with Karber’s 


method. The square of the standard error of estimate of the 50 per cent 
end point is the sum of ee where p; is determined from a line fitted 


to the observed percentages. The contribution in original units to the 


value of > eg at point a, when the observations are grouped by yearly 


or half-yearly class intervals can be shown to be: 
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YEAR INTERVALS HALF-YEAR INTERVALS 
(Pe FA) 4) (Pe— 4) (Ga + A) 
_ 


In this formulation A is the increment or decrement in the observed 
percentages resulting from groups which are slightly younger and older 
in the half-year age classes than in the age class of one year. The 
coefficient 4 in the expression for the year interval is to code the variation 
at point a into original units. The improvement in precision in deter- 
mining the standard error of estimate is measured by the sum of values 
of the type 4A?/n,. 

Secondly, smaller class intervals may be desirable or undesirable 
when reclassification results in unequal numbers of subjects in the 
smaller age classes. For example, using the same notation (dropping 
the subscript a for convenience) the contribution in original units to 
the square of the error of estimate at one point for both types of 
classification is as follows: 


YEAR INTERVALS HALF-YEAR INTERVALS 


(p+A)(q—A) (p—d)(q+d) 
ny ny 


4pq 
where n, +, =n 


ny > Ns 
A and d, same sign 


It is easily seen that 4pq/n is smaller than [pg/n, + pq/nz], and there- 
fore the variance of the estimate using half-year class intervals is greater 
unless one or both of the additional terms in the expression for the 
contribution to variance with half-year intervals is negative and large 
enough to cancel out any advantage derived from the use of the whole 
year interval. It can be shown that this may occur whenever the two 
percentages for the half-year intervals undergo a major shift from the 
average percentage for the whole year interval. This shift is represented 
by the symbols A and d where A is associated with the larger of n,, no. 
For any particular case, since d is a function of A and other constants 
in the example, one can determine how large A must be, both positive 
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and negative, to cause the contribution to the variance with the half- 
year intervals to be less than the contribution with the whole year 
intervals. As a general rule, the critical value of A will be considerably 
smaller when the half-year percentage for the greater of n,, nz moves in 
the direction of the 50 per cent point. On the other hand, if A becomes 
too large and causes p + A to move beyond the 50 per cent point, the 
half-year interval may again become undesirable. 


Unite ct 


which it Is endesiradio 


0.00500 - 


Fic. 1. CONTRIBUTION TO SQUARE OF STANDARD ERROR OF ESTIMATE BY 
Two PERCENTAGES IN HALF-YEAR INTERVALS PLOTTED AGAINST 
THE PERCENTAGE ASSOCIATED WITH LARGER OF 74, Mg. 


This situation is best described by an example. With observations 
classified into whole year age intervals, let pg = 70/100 — 0.7000. The 
contribution to the variance at this point with a yearly classification is 
0.00840 units. Suppose that now the observations are reclassified into 
half-year age classes and n,, the number of subjects in the half-year class 
having the larger number, is 60; the possible values of p, associated 
with n, range from 0.50 to 1.00. The contribution to the variance, under 
these circumstances, can be plotted against p, (Fig. 1). It can be seen 


| 
7 | | 

3 

0.00600 
0.00400 
50 60 70 BO 30 100 
| 
| 
| 


ANTHROMETRY OF SCHOOL CHILDREN 143 


from the figure that when p, is greater than 0.6667 and less than 0.8500, 
it is disadvantageous to use half-year age classes since the contribution 
to the variance is greater than would be the case with whole year age 
classes. 

The decision as to whether to use half-year or whole year class inter- 
vals in a given case is difficult since one does not know how a change 
in classification will influence the fitted percentages. Some help may be 
derived from a study of the observed percentages when the data is 
classified into larger or smaller class intervals. In the case of the data 
analyzed in this paper, the numbers of subjects in the whole year class 
intervals were too few to permit further subdivision. Hogben, Water- 
house and Hogben (9) used half-year age classes. 
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